Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-30 Thread Ayaka


Sent from my iPad

> On Jan 30, 2019, at 3:17 PM, Tomasz Figa  wrote:
> 
>> On Wed, Jan 30, 2019 at 3:28 PM Ayaka  wrote:
>> 
>> 
>> 
>> Sent from my iPad
>> 
>>> On Jan 30, 2019, at 11:35 AM, Tomasz Figa  wrote:
>>> 
>>> On Wed, Jan 30, 2019 at 11:29 AM Alexandre Courbot
>>>  wrote:
 
>> On Wed, Jan 30, 2019 at 6:41 AM Nicolas Dufresne  
>> wrote:
>> 
>> Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
>> On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
>>  wrote:
>>> Hi,
>>> 
 On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
 Sent from my iPad
 
> On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski 
>  wrote:
> 
> Hi,
> 
>> On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
>> I forget a important thing, for the rkvdec and rk hevc decoder, it 
>> would
>> requests cabac table, scaling list, picture parameter set and 
>> reference
>> picture storing in one or various of DMA buffers. I am not talking 
>> about
>> the data been parsed, the decoder would requests a raw data.
>> 
>> For the pps and rps, it is possible to reuse the slice header, just 
>> let
>> the decoder know the offset from the bitstream bufer, I would 
>> suggest to
>> add three properties(with sps) for them. But I think we need a 
>> method to
>> mark a OUTPUT side buffer for those aux data.
> 
> I'm quite confused about the hardware implementation then. From what
> you're saying, it seems that it takes the raw bitstream elements 
> rather
> than parsed elements. Is it really a stateless implementation?
> 
> The stateless implementation was designed with the idea that only the
> raw slice data should be passed in bitstream form to the decoder. For
> H.264, it seems that some decoders also need the slice header in raw
> bitstream form (because they take the full slice NAL unit), see the
> discussions in this thread:
> media: docs-rst: Document m2m stateless video decoder interface
 
 Stateless just mean it won’t track the previous result, but I don’t
 think you can define what a date the hardware would need. Even you
 just build a dpb for the decoder, it is still stateless, but parsing
 less or more data from the bitstream doesn’t stop a decoder become a
 stateless decoder.
>>> 
>>> Yes fair enough, the format in which the hardware decoder takes the
>>> bitstream parameters does not make it stateless or stateful per-se.
>>> It's just that stateless decoders should have no particular reason for
>>> parsing the bitstream on their own since the hardware can be designed
>>> with registers for each relevant bitstream element to configure the
>>> decoding pipeline. That's how GPU-based decoder implementations are
>>> implemented (VAAPI/VDPAU/NVDEC, etc).
>>> 
>>> So the format we have agreed on so far for the stateless interface is
>>> to pass parsed elements via v4l2 control structures.
>>> 
>>> If the hardware can only work by parsing the bitstream itself, I'm not
>>> sure what the best solution would be. Reconstructing the bitstream in
>>> the kernel is a pretty bad option, but so is parsing in the kernel or
>>> having the data both in parsed and raw forms. Do you see another
>>> possibility?
>> 
>> Is reconstructing the bitstream so bad? The v4l2 controls provide a
>> generic interface to an encoded format which the driver needs to
>> convert into a sequence that the hardware can understand. Typically
>> this is done by populating hardware-specific structures. Can't we
>> consider that in this specific instance, the hardware-specific
>> structure just happens to be identical to the original bitstream
>> format?
> 
> At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
> would be really really bad. In GStreamer project we have discussed for
> a while (but have never done anything about) adding the ability through
> a bitmask to select which part of the stream need to be parsed, as
> parsing itself was causing some overhead. Maybe similar thing applies,
> though as per our new design, it's the fourcc that dictate the driver
> behaviour, we'd need yet another fourcc for drivers that wants the full
> bitstream (which seems odd if you have already parsed everything, I
> think this need some clarification).
 
 Note that I am not proposing to rebuild the *entire* bitstream
 in-kernel. What I am saying is that if the hardware interprets some
 structures (like SPS/PPS) in their raw format, this raw format could
 be reconstructed from the structures passed by userspace at negligible
 cost. Such 

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-29 Thread Maxime Ripard
On Wed, Jan 30, 2019 at 12:35:41PM +0900, Tomasz Figa wrote:
> On Wed, Jan 30, 2019 at 11:29 AM Alexandre Courbot
>  wrote:
> >
> > On Wed, Jan 30, 2019 at 6:41 AM Nicolas Dufresne  
> > wrote:
> > >
> > > Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
> > > > On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
> > > >  wrote:
> > > > > Hi,
> > > > >
> > > > > On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> > > > > > Sent from my iPad
> > > > > >
> > > > > > > On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski 
> > > > > > >  wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > > On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> > > > > > > > I forget a important thing, for the rkvdec and rk hevc decoder, 
> > > > > > > > it would
> > > > > > > > requests cabac table, scaling list, picture parameter set and 
> > > > > > > > reference
> > > > > > > > picture storing in one or various of DMA buffers. I am not 
> > > > > > > > talking about
> > > > > > > > the data been parsed, the decoder would requests a raw data.
> > > > > > > >
> > > > > > > > For the pps and rps, it is possible to reuse the slice header, 
> > > > > > > > just let
> > > > > > > > the decoder know the offset from the bitstream bufer, I would 
> > > > > > > > suggest to
> > > > > > > > add three properties(with sps) for them. But I think we need a 
> > > > > > > > method to
> > > > > > > > mark a OUTPUT side buffer for those aux data.
> > > > > > >
> > > > > > > I'm quite confused about the hardware implementation then. From 
> > > > > > > what
> > > > > > > you're saying, it seems that it takes the raw bitstream elements 
> > > > > > > rather
> > > > > > > than parsed elements. Is it really a stateless implementation?
> > > > > > >
> > > > > > > The stateless implementation was designed with the idea that only 
> > > > > > > the
> > > > > > > raw slice data should be passed in bitstream form to the decoder. 
> > > > > > > For
> > > > > > > H.264, it seems that some decoders also need the slice header in 
> > > > > > > raw
> > > > > > > bitstream form (because they take the full slice NAL unit), see 
> > > > > > > the
> > > > > > > discussions in this thread:
> > > > > > > media: docs-rst: Document m2m stateless video decoder interface
> > > > > >
> > > > > > Stateless just mean it won’t track the previous result, but I don’t
> > > > > > think you can define what a date the hardware would need. Even you
> > > > > > just build a dpb for the decoder, it is still stateless, but parsing
> > > > > > less or more data from the bitstream doesn’t stop a decoder become a
> > > > > > stateless decoder.
> > > > >
> > > > > Yes fair enough, the format in which the hardware decoder takes the
> > > > > bitstream parameters does not make it stateless or stateful per-se.
> > > > > It's just that stateless decoders should have no particular reason for
> > > > > parsing the bitstream on their own since the hardware can be designed
> > > > > with registers for each relevant bitstream element to configure the
> > > > > decoding pipeline. That's how GPU-based decoder implementations are
> > > > > implemented (VAAPI/VDPAU/NVDEC, etc).
> > > > >
> > > > > So the format we have agreed on so far for the stateless interface is
> > > > > to pass parsed elements via v4l2 control structures.
> > > > >
> > > > > If the hardware can only work by parsing the bitstream itself, I'm not
> > > > > sure what the best solution would be. Reconstructing the bitstream in
> > > > > the kernel is a pretty bad option, but so is parsing in the kernel or
> > > > > having the data both in parsed and raw forms. Do you see another
> > > > > possibility?
> > > >
> > > > Is reconstructing the bitstream so bad? The v4l2 controls provide a
> > > > generic interface to an encoded format which the driver needs to
> > > > convert into a sequence that the hardware can understand. Typically
> > > > this is done by populating hardware-specific structures. Can't we
> > > > consider that in this specific instance, the hardware-specific
> > > > structure just happens to be identical to the original bitstream
> > > > format?
> > >
> > > At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
> > > would be really really bad. In GStreamer project we have discussed for
> > > a while (but have never done anything about) adding the ability through
> > > a bitmask to select which part of the stream need to be parsed, as
> > > parsing itself was causing some overhead. Maybe similar thing applies,
> > > though as per our new design, it's the fourcc that dictate the driver
> > > behaviour, we'd need yet another fourcc for drivers that wants the full
> > > bitstream (which seems odd if you have already parsed everything, I
> > > think this need some clarification).
> >
> > Note that I am not proposing to rebuild the *entire* bitstream
> > in-kernel. What I am saying is that if the hardware interprets some
> > structures (like SPS/PPS) in their raw format, this raw 

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-29 Thread Tomasz Figa
On Wed, Jan 30, 2019 at 3:28 PM Ayaka  wrote:
>
>
>
> Sent from my iPad
>
> > On Jan 30, 2019, at 11:35 AM, Tomasz Figa  wrote:
> >
> > On Wed, Jan 30, 2019 at 11:29 AM Alexandre Courbot
> >  wrote:
> >>
> >>> On Wed, Jan 30, 2019 at 6:41 AM Nicolas Dufresne  
> >>> wrote:
> >>>
>  Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
>  On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
>   wrote:
> > Hi,
> >
> >> On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> >> Sent from my iPad
> >>
> >>> On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski 
> >>>  wrote:
> >>>
> >>> Hi,
> >>>
>  On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
>  I forget a important thing, for the rkvdec and rk hevc decoder, it 
>  would
>  requests cabac table, scaling list, picture parameter set and 
>  reference
>  picture storing in one or various of DMA buffers. I am not talking 
>  about
>  the data been parsed, the decoder would requests a raw data.
> 
>  For the pps and rps, it is possible to reuse the slice header, just 
>  let
>  the decoder know the offset from the bitstream bufer, I would 
>  suggest to
>  add three properties(with sps) for them. But I think we need a 
>  method to
>  mark a OUTPUT side buffer for those aux data.
> >>>
> >>> I'm quite confused about the hardware implementation then. From what
> >>> you're saying, it seems that it takes the raw bitstream elements 
> >>> rather
> >>> than parsed elements. Is it really a stateless implementation?
> >>>
> >>> The stateless implementation was designed with the idea that only the
> >>> raw slice data should be passed in bitstream form to the decoder. For
> >>> H.264, it seems that some decoders also need the slice header in raw
> >>> bitstream form (because they take the full slice NAL unit), see the
> >>> discussions in this thread:
> >>> media: docs-rst: Document m2m stateless video decoder interface
> >>
> >> Stateless just mean it won’t track the previous result, but I don’t
> >> think you can define what a date the hardware would need. Even you
> >> just build a dpb for the decoder, it is still stateless, but parsing
> >> less or more data from the bitstream doesn’t stop a decoder become a
> >> stateless decoder.
> >
> > Yes fair enough, the format in which the hardware decoder takes the
> > bitstream parameters does not make it stateless or stateful per-se.
> > It's just that stateless decoders should have no particular reason for
> > parsing the bitstream on their own since the hardware can be designed
> > with registers for each relevant bitstream element to configure the
> > decoding pipeline. That's how GPU-based decoder implementations are
> > implemented (VAAPI/VDPAU/NVDEC, etc).
> >
> > So the format we have agreed on so far for the stateless interface is
> > to pass parsed elements via v4l2 control structures.
> >
> > If the hardware can only work by parsing the bitstream itself, I'm not
> > sure what the best solution would be. Reconstructing the bitstream in
> > the kernel is a pretty bad option, but so is parsing in the kernel or
> > having the data both in parsed and raw forms. Do you see another
> > possibility?
> 
>  Is reconstructing the bitstream so bad? The v4l2 controls provide a
>  generic interface to an encoded format which the driver needs to
>  convert into a sequence that the hardware can understand. Typically
>  this is done by populating hardware-specific structures. Can't we
>  consider that in this specific instance, the hardware-specific
>  structure just happens to be identical to the original bitstream
>  format?
> >>>
> >>> At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
> >>> would be really really bad. In GStreamer project we have discussed for
> >>> a while (but have never done anything about) adding the ability through
> >>> a bitmask to select which part of the stream need to be parsed, as
> >>> parsing itself was causing some overhead. Maybe similar thing applies,
> >>> though as per our new design, it's the fourcc that dictate the driver
> >>> behaviour, we'd need yet another fourcc for drivers that wants the full
> >>> bitstream (which seems odd if you have already parsed everything, I
> >>> think this need some clarification).
> >>
> >> Note that I am not proposing to rebuild the *entire* bitstream
> >> in-kernel. What I am saying is that if the hardware interprets some
> >> structures (like SPS/PPS) in their raw format, this raw format could
> >> be reconstructed from the structures passed by userspace at negligible
> >> cost. Such manipulation would only happen on a small amount of data.
> >>
> >> Exposing finer-grained driver 

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-29 Thread Ayaka


Sent from my iPad

> On Jan 30, 2019, at 5:41 AM, Nicolas Dufresne  wrote:
> 
>> Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
>> On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
>>  wrote:
>>> Hi,
>>> 
 On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
 Sent from my iPad
 
> On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski 
>  wrote:
> 
> Hi,
> 
>> On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
>> I forget a important thing, for the rkvdec and rk hevc decoder, it would
>> requests cabac table, scaling list, picture parameter set and reference
>> picture storing in one or various of DMA buffers. I am not talking about
>> the data been parsed, the decoder would requests a raw data.
>> 
>> For the pps and rps, it is possible to reuse the slice header, just let
>> the decoder know the offset from the bitstream bufer, I would suggest to
>> add three properties(with sps) for them. But I think we need a method to
>> mark a OUTPUT side buffer for those aux data.
> 
> I'm quite confused about the hardware implementation then. From what
> you're saying, it seems that it takes the raw bitstream elements rather
> than parsed elements. Is it really a stateless implementation?
> 
> The stateless implementation was designed with the idea that only the
> raw slice data should be passed in bitstream form to the decoder. For
> H.264, it seems that some decoders also need the slice header in raw
> bitstream form (because they take the full slice NAL unit), see the
> discussions in this thread:
> media: docs-rst: Document m2m stateless video decoder interface
 
 Stateless just mean it won’t track the previous result, but I don’t
 think you can define what a date the hardware would need. Even you
 just build a dpb for the decoder, it is still stateless, but parsing
 less or more data from the bitstream doesn’t stop a decoder become a
 stateless decoder.
>>> 
>>> Yes fair enough, the format in which the hardware decoder takes the
>>> bitstream parameters does not make it stateless or stateful per-se.
>>> It's just that stateless decoders should have no particular reason for
>>> parsing the bitstream on their own since the hardware can be designed
>>> with registers for each relevant bitstream element to configure the
>>> decoding pipeline. That's how GPU-based decoder implementations are
>>> implemented (VAAPI/VDPAU/NVDEC, etc).
>>> 
>>> So the format we have agreed on so far for the stateless interface is
>>> to pass parsed elements via v4l2 control structures.
>>> 
>>> If the hardware can only work by parsing the bitstream itself, I'm not
>>> sure what the best solution would be. Reconstructing the bitstream in
>>> the kernel is a pretty bad option, but so is parsing in the kernel or
>>> having the data both in parsed and raw forms. Do you see another
>>> possibility?
>> 
>> Is reconstructing the bitstream so bad? The v4l2 controls provide a
>> generic interface to an encoded format which the driver needs to
>> convert into a sequence that the hardware can understand. Typically
>> this is done by populating hardware-specific structures. Can't we
>> consider that in this specific instance, the hardware-specific
>> structure just happens to be identical to the original bitstream
>> format?
> 
> At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
Lucky, most of hardware won’t be able to processing such a big buffer.
General speaking, the register is 24bits for stream length in bytes.
> would be really really bad. In GStreamer project we have discussed for
> a while (but have never done anything about) adding the ability through
> a bitmask to select which part of the stream need to be parsed, as
> parsing itself was causing some overhead. Maybe similar thing applies,
> though as per our new design, it's the fourcc that dictate the driver
> behaviour, we'd need yet another fourcc for drivers that wants the full
> bitstream (which seems odd if you have already parsed everything, I
> think this need some clarification).
> 
>> 
>> I agree that this is not strictly optimal for that particular
>> hardware, but such is the cost of abstractions, and in this specific
>> case I don't believe the cost would be particularly high?

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-29 Thread Ayaka


Sent from my iPad

> On Jan 30, 2019, at 11:35 AM, Tomasz Figa  wrote:
> 
> On Wed, Jan 30, 2019 at 11:29 AM Alexandre Courbot
>  wrote:
>> 
>>> On Wed, Jan 30, 2019 at 6:41 AM Nicolas Dufresne  
>>> wrote:
>>> 
 Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
 On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
  wrote:
> Hi,
> 
>> On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
>> Sent from my iPad
>> 
>>> On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski 
>>>  wrote:
>>> 
>>> Hi,
>>> 
 On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
 I forget a important thing, for the rkvdec and rk hevc decoder, it 
 would
 requests cabac table, scaling list, picture parameter set and reference
 picture storing in one or various of DMA buffers. I am not talking 
 about
 the data been parsed, the decoder would requests a raw data.
 
 For the pps and rps, it is possible to reuse the slice header, just let
 the decoder know the offset from the bitstream bufer, I would suggest 
 to
 add three properties(with sps) for them. But I think we need a method 
 to
 mark a OUTPUT side buffer for those aux data.
>>> 
>>> I'm quite confused about the hardware implementation then. From what
>>> you're saying, it seems that it takes the raw bitstream elements rather
>>> than parsed elements. Is it really a stateless implementation?
>>> 
>>> The stateless implementation was designed with the idea that only the
>>> raw slice data should be passed in bitstream form to the decoder. For
>>> H.264, it seems that some decoders also need the slice header in raw
>>> bitstream form (because they take the full slice NAL unit), see the
>>> discussions in this thread:
>>> media: docs-rst: Document m2m stateless video decoder interface
>> 
>> Stateless just mean it won’t track the previous result, but I don’t
>> think you can define what a date the hardware would need. Even you
>> just build a dpb for the decoder, it is still stateless, but parsing
>> less or more data from the bitstream doesn’t stop a decoder become a
>> stateless decoder.
> 
> Yes fair enough, the format in which the hardware decoder takes the
> bitstream parameters does not make it stateless or stateful per-se.
> It's just that stateless decoders should have no particular reason for
> parsing the bitstream on their own since the hardware can be designed
> with registers for each relevant bitstream element to configure the
> decoding pipeline. That's how GPU-based decoder implementations are
> implemented (VAAPI/VDPAU/NVDEC, etc).
> 
> So the format we have agreed on so far for the stateless interface is
> to pass parsed elements via v4l2 control structures.
> 
> If the hardware can only work by parsing the bitstream itself, I'm not
> sure what the best solution would be. Reconstructing the bitstream in
> the kernel is a pretty bad option, but so is parsing in the kernel or
> having the data both in parsed and raw forms. Do you see another
> possibility?
 
 Is reconstructing the bitstream so bad? The v4l2 controls provide a
 generic interface to an encoded format which the driver needs to
 convert into a sequence that the hardware can understand. Typically
 this is done by populating hardware-specific structures. Can't we
 consider that in this specific instance, the hardware-specific
 structure just happens to be identical to the original bitstream
 format?
>>> 
>>> At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
>>> would be really really bad. In GStreamer project we have discussed for
>>> a while (but have never done anything about) adding the ability through
>>> a bitmask to select which part of the stream need to be parsed, as
>>> parsing itself was causing some overhead. Maybe similar thing applies,
>>> though as per our new design, it's the fourcc that dictate the driver
>>> behaviour, we'd need yet another fourcc for drivers that wants the full
>>> bitstream (which seems odd if you have already parsed everything, I
>>> think this need some clarification).
>> 
>> Note that I am not proposing to rebuild the *entire* bitstream
>> in-kernel. What I am saying is that if the hardware interprets some
>> structures (like SPS/PPS) in their raw format, this raw format could
>> be reconstructed from the structures passed by userspace at negligible
>> cost. Such manipulation would only happen on a small amount of data.
>> 
>> Exposing finer-grained driver requirements through a bitmask may
>> deserve more exploring. Maybe we could end with a spectrum of
>> capabilities that would allow us to cover the range from fully
>> stateless to fully stateful IPs more smoothly. Right now we have two
>> specifications that only 

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-29 Thread Tomasz Figa
On Wed, Jan 30, 2019 at 11:29 AM Alexandre Courbot
 wrote:
>
> On Wed, Jan 30, 2019 at 6:41 AM Nicolas Dufresne  wrote:
> >
> > Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
> > > On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
> > >  wrote:
> > > > Hi,
> > > >
> > > > On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> > > > > Sent from my iPad
> > > > >
> > > > > > On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski 
> > > > > >  wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > > On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> > > > > > > I forget a important thing, for the rkvdec and rk hevc decoder, 
> > > > > > > it would
> > > > > > > requests cabac table, scaling list, picture parameter set and 
> > > > > > > reference
> > > > > > > picture storing in one or various of DMA buffers. I am not 
> > > > > > > talking about
> > > > > > > the data been parsed, the decoder would requests a raw data.
> > > > > > >
> > > > > > > For the pps and rps, it is possible to reuse the slice header, 
> > > > > > > just let
> > > > > > > the decoder know the offset from the bitstream bufer, I would 
> > > > > > > suggest to
> > > > > > > add three properties(with sps) for them. But I think we need a 
> > > > > > > method to
> > > > > > > mark a OUTPUT side buffer for those aux data.
> > > > > >
> > > > > > I'm quite confused about the hardware implementation then. From what
> > > > > > you're saying, it seems that it takes the raw bitstream elements 
> > > > > > rather
> > > > > > than parsed elements. Is it really a stateless implementation?
> > > > > >
> > > > > > The stateless implementation was designed with the idea that only 
> > > > > > the
> > > > > > raw slice data should be passed in bitstream form to the decoder. 
> > > > > > For
> > > > > > H.264, it seems that some decoders also need the slice header in raw
> > > > > > bitstream form (because they take the full slice NAL unit), see the
> > > > > > discussions in this thread:
> > > > > > media: docs-rst: Document m2m stateless video decoder interface
> > > > >
> > > > > Stateless just mean it won’t track the previous result, but I don’t
> > > > > think you can define what a date the hardware would need. Even you
> > > > > just build a dpb for the decoder, it is still stateless, but parsing
> > > > > less or more data from the bitstream doesn’t stop a decoder become a
> > > > > stateless decoder.
> > > >
> > > > Yes fair enough, the format in which the hardware decoder takes the
> > > > bitstream parameters does not make it stateless or stateful per-se.
> > > > It's just that stateless decoders should have no particular reason for
> > > > parsing the bitstream on their own since the hardware can be designed
> > > > with registers for each relevant bitstream element to configure the
> > > > decoding pipeline. That's how GPU-based decoder implementations are
> > > > implemented (VAAPI/VDPAU/NVDEC, etc).
> > > >
> > > > So the format we have agreed on so far for the stateless interface is
> > > > to pass parsed elements via v4l2 control structures.
> > > >
> > > > If the hardware can only work by parsing the bitstream itself, I'm not
> > > > sure what the best solution would be. Reconstructing the bitstream in
> > > > the kernel is a pretty bad option, but so is parsing in the kernel or
> > > > having the data both in parsed and raw forms. Do you see another
> > > > possibility?
> > >
> > > Is reconstructing the bitstream so bad? The v4l2 controls provide a
> > > generic interface to an encoded format which the driver needs to
> > > convert into a sequence that the hardware can understand. Typically
> > > this is done by populating hardware-specific structures. Can't we
> > > consider that in this specific instance, the hardware-specific
> > > structure just happens to be identical to the original bitstream
> > > format?
> >
> > At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
> > would be really really bad. In GStreamer project we have discussed for
> > a while (but have never done anything about) adding the ability through
> > a bitmask to select which part of the stream need to be parsed, as
> > parsing itself was causing some overhead. Maybe similar thing applies,
> > though as per our new design, it's the fourcc that dictate the driver
> > behaviour, we'd need yet another fourcc for drivers that wants the full
> > bitstream (which seems odd if you have already parsed everything, I
> > think this need some clarification).
>
> Note that I am not proposing to rebuild the *entire* bitstream
> in-kernel. What I am saying is that if the hardware interprets some
> structures (like SPS/PPS) in their raw format, this raw format could
> be reconstructed from the structures passed by userspace at negligible
> cost. Such manipulation would only happen on a small amount of data.
>
> Exposing finer-grained driver requirements through a bitmask may
> deserve more exploring. Maybe we could end with a spectrum of
> 

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-29 Thread Alexandre Courbot
On Wed, Jan 30, 2019 at 6:41 AM Nicolas Dufresne  wrote:
>
> Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
> > On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
> >  wrote:
> > > Hi,
> > >
> > > On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> > > > Sent from my iPad
> > > >
> > > > > On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski 
> > > > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > > On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> > > > > > I forget a important thing, for the rkvdec and rk hevc decoder, it 
> > > > > > would
> > > > > > requests cabac table, scaling list, picture parameter set and 
> > > > > > reference
> > > > > > picture storing in one or various of DMA buffers. I am not talking 
> > > > > > about
> > > > > > the data been parsed, the decoder would requests a raw data.
> > > > > >
> > > > > > For the pps and rps, it is possible to reuse the slice header, just 
> > > > > > let
> > > > > > the decoder know the offset from the bitstream bufer, I would 
> > > > > > suggest to
> > > > > > add three properties(with sps) for them. But I think we need a 
> > > > > > method to
> > > > > > mark a OUTPUT side buffer for those aux data.
> > > > >
> > > > > I'm quite confused about the hardware implementation then. From what
> > > > > you're saying, it seems that it takes the raw bitstream elements 
> > > > > rather
> > > > > than parsed elements. Is it really a stateless implementation?
> > > > >
> > > > > The stateless implementation was designed with the idea that only the
> > > > > raw slice data should be passed in bitstream form to the decoder. For
> > > > > H.264, it seems that some decoders also need the slice header in raw
> > > > > bitstream form (because they take the full slice NAL unit), see the
> > > > > discussions in this thread:
> > > > > media: docs-rst: Document m2m stateless video decoder interface
> > > >
> > > > Stateless just mean it won’t track the previous result, but I don’t
> > > > think you can define what a date the hardware would need. Even you
> > > > just build a dpb for the decoder, it is still stateless, but parsing
> > > > less or more data from the bitstream doesn’t stop a decoder become a
> > > > stateless decoder.
> > >
> > > Yes fair enough, the format in which the hardware decoder takes the
> > > bitstream parameters does not make it stateless or stateful per-se.
> > > It's just that stateless decoders should have no particular reason for
> > > parsing the bitstream on their own since the hardware can be designed
> > > with registers for each relevant bitstream element to configure the
> > > decoding pipeline. That's how GPU-based decoder implementations are
> > > implemented (VAAPI/VDPAU/NVDEC, etc).
> > >
> > > So the format we have agreed on so far for the stateless interface is
> > > to pass parsed elements via v4l2 control structures.
> > >
> > > If the hardware can only work by parsing the bitstream itself, I'm not
> > > sure what the best solution would be. Reconstructing the bitstream in
> > > the kernel is a pretty bad option, but so is parsing in the kernel or
> > > having the data both in parsed and raw forms. Do you see another
> > > possibility?
> >
> > Is reconstructing the bitstream so bad? The v4l2 controls provide a
> > generic interface to an encoded format which the driver needs to
> > convert into a sequence that the hardware can understand. Typically
> > this is done by populating hardware-specific structures. Can't we
> > consider that in this specific instance, the hardware-specific
> > structure just happens to be identical to the original bitstream
> > format?
>
> At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
> would be really really bad. In GStreamer project we have discussed for
> a while (but have never done anything about) adding the ability through
> a bitmask to select which part of the stream need to be parsed, as
> parsing itself was causing some overhead. Maybe similar thing applies,
> though as per our new design, it's the fourcc that dictate the driver
> behaviour, we'd need yet another fourcc for drivers that wants the full
> bitstream (which seems odd if you have already parsed everything, I
> think this need some clarification).

Note that I am not proposing to rebuild the *entire* bitstream
in-kernel. What I am saying is that if the hardware interprets some
structures (like SPS/PPS) in their raw format, this raw format could
be reconstructed from the structures passed by userspace at negligible
cost. Such manipulation would only happen on a small amount of data.

Exposing finer-grained driver requirements through a bitmask may
deserve more exploring. Maybe we could end with a spectrum of
capabilities that would allow us to cover the range from fully
stateless to fully stateful IPs more smoothly. Right now we have two
specifications that only consider the extremes of that range.
___
devel mailing list

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-29 Thread Nicolas Dufresne
Le mardi 29 janvier 2019 à 16:44 +0900, Alexandre Courbot a écrit :
> On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
>  wrote:
> > Hi,
> > 
> > On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> > > Sent from my iPad
> > > 
> > > > On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski 
> > > >  wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > > On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> > > > > I forget a important thing, for the rkvdec and rk hevc decoder, it 
> > > > > would
> > > > > requests cabac table, scaling list, picture parameter set and 
> > > > > reference
> > > > > picture storing in one or various of DMA buffers. I am not talking 
> > > > > about
> > > > > the data been parsed, the decoder would requests a raw data.
> > > > > 
> > > > > For the pps and rps, it is possible to reuse the slice header, just 
> > > > > let
> > > > > the decoder know the offset from the bitstream bufer, I would suggest 
> > > > > to
> > > > > add three properties(with sps) for them. But I think we need a method 
> > > > > to
> > > > > mark a OUTPUT side buffer for those aux data.
> > > > 
> > > > I'm quite confused about the hardware implementation then. From what
> > > > you're saying, it seems that it takes the raw bitstream elements rather
> > > > than parsed elements. Is it really a stateless implementation?
> > > > 
> > > > The stateless implementation was designed with the idea that only the
> > > > raw slice data should be passed in bitstream form to the decoder. For
> > > > H.264, it seems that some decoders also need the slice header in raw
> > > > bitstream form (because they take the full slice NAL unit), see the
> > > > discussions in this thread:
> > > > media: docs-rst: Document m2m stateless video decoder interface
> > > 
> > > Stateless just mean it won’t track the previous result, but I don’t
> > > think you can define what a date the hardware would need. Even you
> > > just build a dpb for the decoder, it is still stateless, but parsing
> > > less or more data from the bitstream doesn’t stop a decoder become a
> > > stateless decoder.
> > 
> > Yes fair enough, the format in which the hardware decoder takes the
> > bitstream parameters does not make it stateless or stateful per-se.
> > It's just that stateless decoders should have no particular reason for
> > parsing the bitstream on their own since the hardware can be designed
> > with registers for each relevant bitstream element to configure the
> > decoding pipeline. That's how GPU-based decoder implementations are
> > implemented (VAAPI/VDPAU/NVDEC, etc).
> > 
> > So the format we have agreed on so far for the stateless interface is
> > to pass parsed elements via v4l2 control structures.
> > 
> > If the hardware can only work by parsing the bitstream itself, I'm not
> > sure what the best solution would be. Reconstructing the bitstream in
> > the kernel is a pretty bad option, but so is parsing in the kernel or
> > having the data both in parsed and raw forms. Do you see another
> > possibility?
> 
> Is reconstructing the bitstream so bad? The v4l2 controls provide a
> generic interface to an encoded format which the driver needs to
> convert into a sequence that the hardware can understand. Typically
> this is done by populating hardware-specific structures. Can't we
> consider that in this specific instance, the hardware-specific
> structure just happens to be identical to the original bitstream
> format?

At maximum allowed bitrate for let's say HEVC (940MB/s iirc), yes, it
would be really really bad. In GStreamer project we have discussed for
a while (but have never done anything about) adding the ability through
a bitmask to select which part of the stream need to be parsed, as
parsing itself was causing some overhead. Maybe similar thing applies,
though as per our new design, it's the fourcc that dictate the driver
behaviour, we'd need yet another fourcc for drivers that wants the full
bitstream (which seems odd if you have already parsed everything, I
think this need some clarification).

> 
> I agree that this is not strictly optimal for that particular
> hardware, but such is the cost of abstractions, and in this specific
> case I don't believe the cost would be particularly high?


signature.asc
Description: This is a digitally signed message part
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-29 Thread Tomasz Figa
On Tue, Jan 29, 2019 at 5:09 PM Maxime Ripard  wrote:
>
> On Tue, Jan 29, 2019 at 04:44:35PM +0900, Alexandre Courbot wrote:
> > On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
> > > On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> > > >
> > > > Sent from my iPad
> > > >
> > > > > On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski 
> > > > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > > On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> > > > > > I forget a important thing, for the rkvdec and rk hevc decoder, it 
> > > > > > would
> > > > > > requests cabac table, scaling list, picture parameter set and 
> > > > > > reference
> > > > > > picture storing in one or various of DMA buffers. I am not talking 
> > > > > > about
> > > > > > the data been parsed, the decoder would requests a raw data.
> > > > > >
> > > > > > For the pps and rps, it is possible to reuse the slice header, just 
> > > > > > let
> > > > > > the decoder know the offset from the bitstream bufer, I would 
> > > > > > suggest to
> > > > > > add three properties(with sps) for them. But I think we need a 
> > > > > > method to
> > > > > > mark a OUTPUT side buffer for those aux data.
> > > > >
> > > > > I'm quite confused about the hardware implementation then. From what
> > > > > you're saying, it seems that it takes the raw bitstream elements 
> > > > > rather
> > > > > than parsed elements. Is it really a stateless implementation?
> > > > >
> > > > > The stateless implementation was designed with the idea that only the
> > > > > raw slice data should be passed in bitstream form to the decoder. For
> > > > > H.264, it seems that some decoders also need the slice header in raw
> > > > > bitstream form (because they take the full slice NAL unit), see the
> > > > > discussions in this thread:
> > > > > media: docs-rst: Document m2m stateless video decoder interface
> > > >
> > > > Stateless just mean it won’t track the previous result, but I don’t
> > > > think you can define what a date the hardware would need. Even you
> > > > just build a dpb for the decoder, it is still stateless, but parsing
> > > > less or more data from the bitstream doesn’t stop a decoder become a
> > > > stateless decoder.
> > >
> > > Yes fair enough, the format in which the hardware decoder takes the
> > > bitstream parameters does not make it stateless or stateful per-se.
> > > It's just that stateless decoders should have no particular reason for
> > > parsing the bitstream on their own since the hardware can be designed
> > > with registers for each relevant bitstream element to configure the
> > > decoding pipeline. That's how GPU-based decoder implementations are
> > > implemented (VAAPI/VDPAU/NVDEC, etc).
> > >
> > > So the format we have agreed on so far for the stateless interface is
> > > to pass parsed elements via v4l2 control structures.
> > >
> > > If the hardware can only work by parsing the bitstream itself, I'm not
> > > sure what the best solution would be. Reconstructing the bitstream in
> > > the kernel is a pretty bad option, but so is parsing in the kernel or
> > > having the data both in parsed and raw forms. Do you see another
> > > possibility?
> >
> > Is reconstructing the bitstream so bad? The v4l2 controls provide a
> > generic interface to an encoded format which the driver needs to
> > convert into a sequence that the hardware can understand. Typically
> > this is done by populating hardware-specific structures. Can't we
> > consider that in this specific instance, the hardware-specific
> > structure just happens to be identical to the original bitstream
> > format?
> >
> > I agree that this is not strictly optimal for that particular
> > hardware, but such is the cost of abstractions, and in this specific
> > case I don't believe the cost would be particularly high?
>
> I mean, that argument can be made for the rockchip driver as well. If
> reconstructing the bitstream is something we can do, and if we don't
> care about being suboptimal for one particular hardware, then why the
> rockchip driver doesn't just recreate the bitstream from that API?
>
> After all, this is just a hardware specific header that happens to be
> identical to the original bitstream format

I think in another thread (about H.264 I believe), we realized that it
could be a good idea to just include the Slice NAL units in the
Annex.B format in the buffers and that should work for all the
hardware we could think of (given offsets to particular parts inside
of the buffer). Wouldn't something similar work here for HEVC?

I don't really get the meaning of "raw" for "cabac table, scaling
list, picture parameter set and reference picture", since those are
parts of the bitstream, which needs to be parsed to obtain those.

Best regards,
Tomasz
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-29 Thread Maxime Ripard
On Tue, Jan 29, 2019 at 04:44:35PM +0900, Alexandre Courbot wrote:
> On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
> > On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> > >
> > > Sent from my iPad
> > >
> > > > On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski 
> > > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > > On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> > > > > I forget a important thing, for the rkvdec and rk hevc decoder, it 
> > > > > would
> > > > > requests cabac table, scaling list, picture parameter set and 
> > > > > reference
> > > > > picture storing in one or various of DMA buffers. I am not talking 
> > > > > about
> > > > > the data been parsed, the decoder would requests a raw data.
> > > > >
> > > > > For the pps and rps, it is possible to reuse the slice header, just 
> > > > > let
> > > > > the decoder know the offset from the bitstream bufer, I would suggest 
> > > > > to
> > > > > add three properties(with sps) for them. But I think we need a method 
> > > > > to
> > > > > mark a OUTPUT side buffer for those aux data.
> > > >
> > > > I'm quite confused about the hardware implementation then. From what
> > > > you're saying, it seems that it takes the raw bitstream elements rather
> > > > than parsed elements. Is it really a stateless implementation?
> > > >
> > > > The stateless implementation was designed with the idea that only the
> > > > raw slice data should be passed in bitstream form to the decoder. For
> > > > H.264, it seems that some decoders also need the slice header in raw
> > > > bitstream form (because they take the full slice NAL unit), see the
> > > > discussions in this thread:
> > > > media: docs-rst: Document m2m stateless video decoder interface
> > >
> > > Stateless just mean it won’t track the previous result, but I don’t
> > > think you can define what a date the hardware would need. Even you
> > > just build a dpb for the decoder, it is still stateless, but parsing
> > > less or more data from the bitstream doesn’t stop a decoder become a
> > > stateless decoder.
> >
> > Yes fair enough, the format in which the hardware decoder takes the
> > bitstream parameters does not make it stateless or stateful per-se.
> > It's just that stateless decoders should have no particular reason for
> > parsing the bitstream on their own since the hardware can be designed
> > with registers for each relevant bitstream element to configure the
> > decoding pipeline. That's how GPU-based decoder implementations are
> > implemented (VAAPI/VDPAU/NVDEC, etc).
> >
> > So the format we have agreed on so far for the stateless interface is
> > to pass parsed elements via v4l2 control structures.
> >
> > If the hardware can only work by parsing the bitstream itself, I'm not
> > sure what the best solution would be. Reconstructing the bitstream in
> > the kernel is a pretty bad option, but so is parsing in the kernel or
> > having the data both in parsed and raw forms. Do you see another
> > possibility?
> 
> Is reconstructing the bitstream so bad? The v4l2 controls provide a
> generic interface to an encoded format which the driver needs to
> convert into a sequence that the hardware can understand. Typically
> this is done by populating hardware-specific structures. Can't we
> consider that in this specific instance, the hardware-specific
> structure just happens to be identical to the original bitstream
> format?
> 
> I agree that this is not strictly optimal for that particular
> hardware, but such is the cost of abstractions, and in this specific
> case I don't believe the cost would be particularly high?

I mean, that argument can be made for the rockchip driver as well. If
reconstructing the bitstream is something we can do, and if we don't
care about being suboptimal for one particular hardware, then why the
rockchip driver doesn't just recreate the bitstream from that API?

After all, this is just a hardware specific header that happens to be
identical to the original bitstream format

Maxime

-- 
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-28 Thread Alexandre Courbot
On Fri, Jan 25, 2019 at 10:04 PM Paul Kocialkowski
 wrote:
>
> Hi,
>
> On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> >
> > Sent from my iPad
> >
> > > On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski 
> > >  wrote:
> > >
> > > Hi,
> > >
> > > > On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> > > > I forget a important thing, for the rkvdec and rk hevc decoder, it would
> > > > requests cabac table, scaling list, picture parameter set and reference
> > > > picture storing in one or various of DMA buffers. I am not talking about
> > > > the data been parsed, the decoder would requests a raw data.
> > > >
> > > > For the pps and rps, it is possible to reuse the slice header, just let
> > > > the decoder know the offset from the bitstream bufer, I would suggest to
> > > > add three properties(with sps) for them. But I think we need a method to
> > > > mark a OUTPUT side buffer for those aux data.
> > >
> > > I'm quite confused about the hardware implementation then. From what
> > > you're saying, it seems that it takes the raw bitstream elements rather
> > > than parsed elements. Is it really a stateless implementation?
> > >
> > > The stateless implementation was designed with the idea that only the
> > > raw slice data should be passed in bitstream form to the decoder. For
> > > H.264, it seems that some decoders also need the slice header in raw
> > > bitstream form (because they take the full slice NAL unit), see the
> > > discussions in this thread:
> > > media: docs-rst: Document m2m stateless video decoder interface
> >
> > Stateless just mean it won’t track the previous result, but I don’t
> > think you can define what a date the hardware would need. Even you
> > just build a dpb for the decoder, it is still stateless, but parsing
> > less or more data from the bitstream doesn’t stop a decoder become a
> > stateless decoder.
>
> Yes fair enough, the format in which the hardware decoder takes the
> bitstream parameters does not make it stateless or stateful per-se.
> It's just that stateless decoders should have no particular reason for
> parsing the bitstream on their own since the hardware can be designed
> with registers for each relevant bitstream element to configure the
> decoding pipeline. That's how GPU-based decoder implementations are
> implemented (VAAPI/VDPAU/NVDEC, etc).
>
> So the format we have agreed on so far for the stateless interface is
> to pass parsed elements via v4l2 control structures.
>
> If the hardware can only work by parsing the bitstream itself, I'm not
> sure what the best solution would be. Reconstructing the bitstream in
> the kernel is a pretty bad option, but so is parsing in the kernel or
> having the data both in parsed and raw forms. Do you see another
> possibility?

Is reconstructing the bitstream so bad? The v4l2 controls provide a
generic interface to an encoded format which the driver needs to
convert into a sequence that the hardware can understand. Typically
this is done by populating hardware-specific structures. Can't we
consider that in this specific instance, the hardware-specific
structure just happens to be identical to the original bitstream
format?

I agree that this is not strictly optimal for that particular
hardware, but such is the cost of abstractions, and in this specific
case I don't believe the cost would be particularly high?
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-25 Thread Paul Kocialkowski
Hi,

On Thu, 2019-01-24 at 20:23 +0800, Ayaka wrote:
> 
> Sent from my iPad
> 
> > On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski 
> >  wrote:
> > 
> > Hi,
> > 
> > > On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> > > I forget a important thing, for the rkvdec and rk hevc decoder, it would 
> > > requests cabac table, scaling list, picture parameter set and reference 
> > > picture storing in one or various of DMA buffers. I am not talking about 
> > > the data been parsed, the decoder would requests a raw data.
> > > 
> > > For the pps and rps, it is possible to reuse the slice header, just let 
> > > the decoder know the offset from the bitstream bufer, I would suggest to 
> > > add three properties(with sps) for them. But I think we need a method to 
> > > mark a OUTPUT side buffer for those aux data.
> > 
> > I'm quite confused about the hardware implementation then. From what
> > you're saying, it seems that it takes the raw bitstream elements rather
> > than parsed elements. Is it really a stateless implementation?
> > 
> > The stateless implementation was designed with the idea that only the
> > raw slice data should be passed in bitstream form to the decoder. For
> > H.264, it seems that some decoders also need the slice header in raw
> > bitstream form (because they take the full slice NAL unit), see the
> > discussions in this thread:
> > media: docs-rst: Document m2m stateless video decoder interface
> 
> Stateless just mean it won’t track the previous result, but I don’t
> think you can define what a date the hardware would need. Even you
> just build a dpb for the decoder, it is still stateless, but parsing
> less or more data from the bitstream doesn’t stop a decoder become a
> stateless decoder.

Yes fair enough, the format in which the hardware decoder takes the
bitstream parameters does not make it stateless or stateful per-se.
It's just that stateless decoders should have no particular reason for
parsing the bitstream on their own since the hardware can be designed
with registers for each relevant bitstream element to configure the
decoding pipeline. That's how GPU-based decoder implementations are
implemented (VAAPI/VDPAU/NVDEC, etc).

So the format we have agreed on so far for the stateless interface is
to pass parsed elements via v4l2 control structures.

If the hardware can only work by parsing the bitstream itself, I'm not
sure what the best solution would be. Reconstructing the bitstream in
the kernel is a pretty bad option, but so is parsing in the kernel or
having the data both in parsed and raw forms. Do you see another
possibility?

Cheers,

Paul

> > Can you detail exactly what the rockchip decoder absolutely needs in
> > raw bitstream format?
> > 
> > Cheers,
> > 
> > Paul
> > 
> > > > On 1/8/19 6:00 PM, Ayaka wrote:
> > > > Sent from my iPad
> > > > 
> > > > > On Jan 8, 2019, at 4:38 PM, Paul Kocialkowski 
> > > > >  wrote:
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > > On Tue, 2019-01-08 at 09:16 +0800, Ayaka wrote:
> > > > > > 
> > > > > > Sent from my iPad
> > > > > > 
> > > > > > > On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski 
> > > > > > >  wrote:
> > > > > > > 
> > > > > > > Hi,
> > > > > > > 
> > > > > > > > > On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
> > > > > > > > > On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
> > > > > > > > > 
> > > > > > > > > > > +
> > > > > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE0x01
> > > > > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER0x02
> > > > > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR0x03
> > > > > > > > > > > +
> > > > > > > > > > > +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX16
> > > > > > > > > > > +
> > > > > > > > > > > +struct v4l2_hevc_dpb_entry {
> > > > > > > > > > > +__u32buffer_tag;
> > > > > > > > > > > +__u8rps;
> > > > > > > > > > > +__u8field_pic;
> > > > > > > > > > > +__u16pic_order_cnt[2];
> > > > > > > > > > > +};
> > > > > > > > Please add a property for reference index, if that rps is not 
> > > > > > > > used for
> > > > > > > > this, some device would request that(not the rockchip one). And
> > > > > > > > Rockchip's VDPU1 and VDPU2 for AVC would request a similar 
> > > > > > > > property.
> > > > > > > What exactly is that reference index? Is it a bitstream element or
> > > > > > > something deduced from the bitstream?
> > > > > > > 
> > > > > > picture order count(POC) for HEVC and frame_num in AVC. I think it 
> > > > > > is
> > > > > > the number used in list0(P slice and B slice) and list1(B slice).
> > > > > The picture order count is already the last field of the DPB entry
> > > > > structure. There is one for each field picture.
> > > > As we are not sure whether there is a field coded slice or CTU, I would 
> > > > hold this part and else about the field.
> > > > > > > > Adding 

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-24 Thread Ayaka


Sent from my iPad

> On Jan 24, 2019, at 6:27 PM, Paul Kocialkowski 
>  wrote:
> 
> Hi,
> 
>> On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
>> I forget a important thing, for the rkvdec and rk hevc decoder, it would 
>> requests cabac table, scaling list, picture parameter set and reference 
>> picture storing in one or various of DMA buffers. I am not talking about 
>> the data been parsed, the decoder would requests a raw data.
>> 
>> For the pps and rps, it is possible to reuse the slice header, just let 
>> the decoder know the offset from the bitstream bufer, I would suggest to 
>> add three properties(with sps) for them. But I think we need a method to 
>> mark a OUTPUT side buffer for those aux data.
> 
> I'm quite confused about the hardware implementation then. From what
> you're saying, it seems that it takes the raw bitstream elements rather
> than parsed elements. Is it really a stateless implementation?
> 
> The stateless implementation was designed with the idea that only the
> raw slice data should be passed in bitstream form to the decoder. For
> H.264, it seems that some decoders also need the slice header in raw
> bitstream form (because they take the full slice NAL unit), see the
> discussions in this thread:
> media: docs-rst: Document m2m stateless video decoder interface

Stateless just mean it won’t track the previous result, but I don’t think you 
can define what a date the hardware would need. Even you just build a dpb for 
the decoder, it is still stateless, but parsing less or more data from the 
bitstream doesn’t stop a decoder become a stateless decoder.
> 
> Can you detail exactly what the rockchip decoder absolutely needs in
> raw bitstream format?
> 
> Cheers,
> 
> Paul
> 
>>> On 1/8/19 6:00 PM, Ayaka wrote:
>>> Sent from my iPad
>>> 
 On Jan 8, 2019, at 4:38 PM, Paul Kocialkowski 
  wrote:
 
 Hi,
 
> On Tue, 2019-01-08 at 09:16 +0800, Ayaka wrote:
> 
> Sent from my iPad
> 
>> On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski 
>>  wrote:
>> 
>> Hi,
>> 
 On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
 On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
 Hi,
 
 On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
 
>> +
>> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE0x01
>> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER0x02
>> +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR0x03
>> +
>> +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX16
>> +
>> +struct v4l2_hevc_dpb_entry {
>> +__u32buffer_tag;
>> +__u8rps;
>> +__u8field_pic;
>> +__u16pic_order_cnt[2];
>> +};
>>> Please add a property for reference index, if that rps is not used for
>>> this, some device would request that(not the rockchip one). And
>>> Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.
>> What exactly is that reference index? Is it a bitstream element or
>> something deduced from the bitstream?
>> 
> picture order count(POC) for HEVC and frame_num in AVC. I think it is
> the number used in list0(P slice and B slice) and list1(B slice).
 The picture order count is already the last field of the DPB entry
 structure. There is one for each field picture.
>>> As we are not sure whether there is a field coded slice or CTU, I would 
>>> hold this part and else about the field.
>>> Adding another buffer_tag for referring the memory of the motion vectors
>>> for each frames. Or a better method is add a meta data to echo picture
>>> buffer,  since the picture output is just the same as the original,
>>> display won't care whether the motion vectors are written the button of
>>> picture or somewhere else.
>> The motion vectors are passed as part of the raw bitstream data, in the
>> slices. Is there a case where the motion vectors are coded differently?
> No, it is an additional cache for decoder, even FFmpeg having such
> data, I think allwinner must output it into somewhere.
 Ah yes I see what you mean! This is handled internally by our driver
 and not exposed to userspace. I don't think it would be a good idea to
 expose this cache or request that userspace allocates it like a video
 buffer.
 
>>> No, usually the driver should allocate, as the user space have no idea on 
>>> size of each devices.
>>> But for advantage user, application can fix a broken picture with a proper 
>>> data or analysis a object motion from that.
>>> So I would suggest attaching this information to a picture buffer as a meta 
>>> data.
>> +
>> +struct v4l2_hevc_pred_weight_table {
>> +__u8luma_log2_weight_denom;
>> +__s8delta_chroma_log2_weight_denom;
>> +
>> +__s8

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-24 Thread Ayaka


Sent from my iPad

> On Jan 24, 2019, at 6:36 PM, Paul Kocialkowski 
>  wrote:
> 
> Hi,
> 
>> On Tue, 2019-01-08 at 18:00 +0800, Ayaka wrote:
>> 
>> Sent from my iPad
>> 
>>> On Jan 8, 2019, at 4:38 PM, Paul Kocialkowski 
>>>  wrote:
>>> 
>>> Hi,
>>> 
 On Tue, 2019-01-08 at 09:16 +0800, Ayaka wrote:
 
 Sent from my iPad
 
> On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski 
>  wrote:
> 
> Hi,
> 
>>> On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
>>> On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
>>> Hi,
>>> 
>>> On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
>>> 
> +
> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE0x01
> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER0x02
> +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR0x03
> +
> +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX16
> +
> +struct v4l2_hevc_dpb_entry {
> +__u32buffer_tag;
> +__u8rps;
> +__u8field_pic;
> +__u16pic_order_cnt[2];
> +};
>> 
>> Please add a property for reference index, if that rps is not used for 
>> this, some device would request that(not the rockchip one). And 
>> Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.
> 
> What exactly is that reference index? Is it a bitstream element or
> something deduced from the bitstream?
> 
 picture order count(POC) for HEVC and frame_num in AVC. I think it is
 the number used in list0(P slice and B slice) and list1(B slice).
>>> 
>>> The picture order count is already the last field of the DPB entry
>>> structure. There is one for each field picture.
>> As we are not sure whether there is a field coded slice or CTU, I
>> would hold this part and else about the field.
> 
> I'm not sure what you meant here, sorry.
As we talked in IRC, I am not sure the field coded picture is supported in HEVC.
And I don’t why there would be two pic order cnt, a picture can only be used a 
short term or  a long term reference at one picture decoding
> 
>> Adding another buffer_tag for referring the memory of the motion vectors 
>> for each frames. Or a better method is add a meta data to echo picture 
>> buffer,  since the picture output is just the same as the original, 
>> display won't care whether the motion vectors are written the button of 
>> picture or somewhere else.
> 
> The motion vectors are passed as part of the raw bitstream data, in the
> slices. Is there a case where the motion vectors are coded differently?
 No, it is an additional cache for decoder, even FFmpeg having such
 data, I think allwinner must output it into somewhere.
>>> 
>>> Ah yes I see what you mean! This is handled internally by our driver
>>> and not exposed to userspace. I don't think it would be a good idea to
>>> expose this cache or request that userspace allocates it like a video
>>> buffer.
>>> 
>> No, usually the driver should allocate, as the user space have no
>> idea on size of each devices.
>> But for advantage user, application can fix a broken picture with a
>> proper data or analysis a object motion from that.
>> So I would suggest attaching this information to a picture buffer as
>> a meta data. 
> 
> Right, the driver will allocate chunks of memory for the decoding
> metadata used by the hardware decoder.
> 
> Well, I don't think V4L2 has any mechanism to expose this data for now
> and since it's very specific to the hardware implementation, I guess
> the interest in having that is generally pretty low.
> 
> That's maybe something that could be added later if someone wants to
> work on it, but I think we are better off keeping this metadata hidden
> by the driver for now.
I am writing a V4l2 driver for rockchip based on the previous vendor driver I 
sent to mail list. I think I would offer a better way to describe the meta 
after that. But it need both work in derives and userspace, it would cost some 
times.
> 
> +
> +struct v4l2_hevc_pred_weight_table {
> +__u8luma_log2_weight_denom;
> +__s8delta_chroma_log2_weight_denom;
> +
> +__s8delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> +__s8luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> +__s8delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> +__s8chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> +
> +__s8delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> +__s8luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> +__s8delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> +__s8chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> +};
> +
>> Those properties I think are not necessary are applying for the 
>> Rockchip's 

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-24 Thread Paul Kocialkowski
Hi,

On Tue, 2019-01-08 at 18:00 +0800, Ayaka wrote:
> 
> Sent from my iPad
> 
> > On Jan 8, 2019, at 4:38 PM, Paul Kocialkowski 
> >  wrote:
> > 
> > Hi,
> > 
> > > On Tue, 2019-01-08 at 09:16 +0800, Ayaka wrote:
> > > 
> > > Sent from my iPad
> > > 
> > > > On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski 
> > > >  wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > > > On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
> > > > > > On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
> > > > > > 
> > > > > > > > +
> > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE0x01
> > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER0x02
> > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR0x03
> > > > > > > > +
> > > > > > > > +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX16
> > > > > > > > +
> > > > > > > > +struct v4l2_hevc_dpb_entry {
> > > > > > > > +__u32buffer_tag;
> > > > > > > > +__u8rps;
> > > > > > > > +__u8field_pic;
> > > > > > > > +__u16pic_order_cnt[2];
> > > > > > > > +};
> > > > > 
> > > > > Please add a property for reference index, if that rps is not used 
> > > > > for 
> > > > > this, some device would request that(not the rockchip one). And 
> > > > > Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.
> > > > 
> > > > What exactly is that reference index? Is it a bitstream element or
> > > > something deduced from the bitstream?
> > > > 
> > > picture order count(POC) for HEVC and frame_num in AVC. I think it is
> > > the number used in list0(P slice and B slice) and list1(B slice).
> > 
> > The picture order count is already the last field of the DPB entry
> > structure. There is one for each field picture.
> As we are not sure whether there is a field coded slice or CTU, I
> would hold this part and else about the field.

I'm not sure what you meant here, sorry.

> > > > > Adding another buffer_tag for referring the memory of the motion 
> > > > > vectors 
> > > > > for each frames. Or a better method is add a meta data to echo 
> > > > > picture 
> > > > > buffer,  since the picture output is just the same as the original, 
> > > > > display won't care whether the motion vectors are written the button 
> > > > > of 
> > > > > picture or somewhere else.
> > > > 
> > > > The motion vectors are passed as part of the raw bitstream data, in the
> > > > slices. Is there a case where the motion vectors are coded differently?
> > > No, it is an additional cache for decoder, even FFmpeg having such
> > > data, I think allwinner must output it into somewhere.
> > 
> > Ah yes I see what you mean! This is handled internally by our driver
> > and not exposed to userspace. I don't think it would be a good idea to
> > expose this cache or request that userspace allocates it like a video
> > buffer.
> > 
> No, usually the driver should allocate, as the user space have no
> idea on size of each devices.
> But for advantage user, application can fix a broken picture with a
> proper data or analysis a object motion from that.
> So I would suggest attaching this information to a picture buffer as
> a meta data. 

Right, the driver will allocate chunks of memory for the decoding
metadata used by the hardware decoder.

Well, I don't think V4L2 has any mechanism to expose this data for now
and since it's very specific to the hardware implementation, I guess
the interest in having that is generally pretty low.

That's maybe something that could be added later if someone wants to
work on it, but I think we are better off keeping this metadata hidden
by the driver for now.

> > > > > > > > +
> > > > > > > > +struct v4l2_hevc_pred_weight_table {
> > > > > > > > +__u8luma_log2_weight_denom;
> > > > > > > > +__s8delta_chroma_log2_weight_denom;
> > > > > > > > +
> > > > > > > > +__s8
> > > > > > > > delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > +__s8luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > +__s8
> > > > > > > > delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > > > +__s8chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > > > +
> > > > > > > > +__s8
> > > > > > > > delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > +__s8luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > +__s8
> > > > > > > > delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > > > +__s8chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > > > +};
> > > > > > > > +
> > > > > Those properties I think are not necessary are applying for the 
> > > > > Rockchip's device, may not work for the others.
> > > > 
> > > > Yes, it's possible that some of the elements are not necessary for some
> > > > decoders. What we want is to cover all the elements that might be
> > > > required for a 

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-24 Thread Paul Kocialkowski
Hi,

On Thu, 2019-01-10 at 21:32 +0800, ayaka wrote:
> I forget a important thing, for the rkvdec and rk hevc decoder, it would 
> requests cabac table, scaling list, picture parameter set and reference 
> picture storing in one or various of DMA buffers. I am not talking about 
> the data been parsed, the decoder would requests a raw data.
> 
> For the pps and rps, it is possible to reuse the slice header, just let 
> the decoder know the offset from the bitstream bufer, I would suggest to 
> add three properties(with sps) for them. But I think we need a method to 
> mark a OUTPUT side buffer for those aux data.

I'm quite confused about the hardware implementation then. From what
you're saying, it seems that it takes the raw bitstream elements rather
than parsed elements. Is it really a stateless implementation?

The stateless implementation was designed with the idea that only the
raw slice data should be passed in bitstream form to the decoder. For
H.264, it seems that some decoders also need the slice header in raw
bitstream form (because they take the full slice NAL unit), see the
discussions in this thread:
media: docs-rst: Document m2m stateless video decoder interface

Can you detail exactly what the rockchip decoder absolutely needs in
raw bitstream format?

Cheers,

Paul

> On 1/8/19 6:00 PM, Ayaka wrote:
> > Sent from my iPad
> > 
> > > On Jan 8, 2019, at 4:38 PM, Paul Kocialkowski 
> > >  wrote:
> > > 
> > > Hi,
> > > 
> > > > On Tue, 2019-01-08 at 09:16 +0800, Ayaka wrote:
> > > > 
> > > > Sent from my iPad
> > > > 
> > > > > On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski 
> > > > >  wrote:
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > > > On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
> > > > > > > On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
> > > > > > > 
> > > > > > > > > +
> > > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE0x01
> > > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER0x02
> > > > > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR0x03
> > > > > > > > > +
> > > > > > > > > +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX16
> > > > > > > > > +
> > > > > > > > > +struct v4l2_hevc_dpb_entry {
> > > > > > > > > +__u32buffer_tag;
> > > > > > > > > +__u8rps;
> > > > > > > > > +__u8field_pic;
> > > > > > > > > +__u16pic_order_cnt[2];
> > > > > > > > > +};
> > > > > > Please add a property for reference index, if that rps is not used 
> > > > > > for
> > > > > > this, some device would request that(not the rockchip one). And
> > > > > > Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.
> > > > > What exactly is that reference index? Is it a bitstream element or
> > > > > something deduced from the bitstream?
> > > > > 
> > > > picture order count(POC) for HEVC and frame_num in AVC. I think it is
> > > > the number used in list0(P slice and B slice) and list1(B slice).
> > > The picture order count is already the last field of the DPB entry
> > > structure. There is one for each field picture.
> > As we are not sure whether there is a field coded slice or CTU, I would 
> > hold this part and else about the field.
> > > > > > Adding another buffer_tag for referring the memory of the motion 
> > > > > > vectors
> > > > > > for each frames. Or a better method is add a meta data to echo 
> > > > > > picture
> > > > > > buffer,  since the picture output is just the same as the original,
> > > > > > display won't care whether the motion vectors are written the 
> > > > > > button of
> > > > > > picture or somewhere else.
> > > > > The motion vectors are passed as part of the raw bitstream data, in 
> > > > > the
> > > > > slices. Is there a case where the motion vectors are coded 
> > > > > differently?
> > > > No, it is an additional cache for decoder, even FFmpeg having such
> > > > data, I think allwinner must output it into somewhere.
> > > Ah yes I see what you mean! This is handled internally by our driver
> > > and not exposed to userspace. I don't think it would be a good idea to
> > > expose this cache or request that userspace allocates it like a video
> > > buffer.
> > > 
> > No, usually the driver should allocate, as the user space have no idea on 
> > size of each devices.
> > But for advantage user, application can fix a broken picture with a proper 
> > data or analysis a object motion from that.
> > So I would suggest attaching this information to a picture buffer as a meta 
> > data.
> > > > > > > > > +
> > > > > > > > > +struct v4l2_hevc_pred_weight_table {
> > > > > > > > > +__u8luma_log2_weight_denom;
> > > > > > > > > +__s8delta_chroma_log2_weight_denom;
> > > > > > > > > +
> > > > > > > > > +__s8
> > > > > > > > > delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > > > > +__s8luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > 

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-10 Thread ayaka
I forget a important thing, for the rkvdec and rk hevc decoder, it would 
requests cabac table, scaling list, picture parameter set and reference 
picture storing in one or various of DMA buffers. I am not talking about 
the data been parsed, the decoder would requests a raw data.


For the pps and rps, it is possible to reuse the slice header, just let 
the decoder know the offset from the bitstream bufer, I would suggest to 
add three properties(with sps) for them. But I think we need a method to 
mark a OUTPUT side buffer for those aux data.


On 1/8/19 6:00 PM, Ayaka wrote:


Sent from my iPad


On Jan 8, 2019, at 4:38 PM, Paul Kocialkowski  
wrote:

Hi,


On Tue, 2019-01-08 at 09:16 +0800, Ayaka wrote:

Sent from my iPad


On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski  
wrote:

Hi,


On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
Hi,

On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:


+
+#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE0x01
+#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER0x02
+#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR0x03
+
+#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX16
+
+struct v4l2_hevc_dpb_entry {
+__u32buffer_tag;
+__u8rps;
+__u8field_pic;
+__u16pic_order_cnt[2];
+};

Please add a property for reference index, if that rps is not used for
this, some device would request that(not the rockchip one). And
Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.

What exactly is that reference index? Is it a bitstream element or
something deduced from the bitstream?


picture order count(POC) for HEVC and frame_num in AVC. I think it is
the number used in list0(P slice and B slice) and list1(B slice).

The picture order count is already the last field of the DPB entry
structure. There is one for each field picture.

As we are not sure whether there is a field coded slice or CTU, I would hold 
this part and else about the field.

Adding another buffer_tag for referring the memory of the motion vectors
for each frames. Or a better method is add a meta data to echo picture
buffer,  since the picture output is just the same as the original,
display won't care whether the motion vectors are written the button of
picture or somewhere else.

The motion vectors are passed as part of the raw bitstream data, in the
slices. Is there a case where the motion vectors are coded differently?

No, it is an additional cache for decoder, even FFmpeg having such
data, I think allwinner must output it into somewhere.

Ah yes I see what you mean! This is handled internally by our driver
and not exposed to userspace. I don't think it would be a good idea to
expose this cache or request that userspace allocates it like a video
buffer.


No, usually the driver should allocate, as the user space have no idea on size 
of each devices.
But for advantage user, application can fix a broken picture with a proper data 
or analysis a object motion from that.
So I would suggest attaching this information to a picture buffer as a meta 
data.

+
+struct v4l2_hevc_pred_weight_table {
+__u8luma_log2_weight_denom;
+__s8delta_chroma_log2_weight_denom;
+
+__s8delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+__s8luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+__s8delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
+__s8chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
+
+__s8delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+__s8luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+__s8delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
+__s8chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
+};
+

Those properties I think are not necessary are applying for the
Rockchip's device, may not work for the others.

Yes, it's possible that some of the elements are not necessary for some
decoders. What we want is to cover all the elements that might be
required for a decoder.

I wonder whether allwinner need that, those sao flag usually ignored
by decoder in design. But more is better than less, it is hard to
extend a v4l2 structure  in the future, maybe a new HEVC profile
would bring a new property, it is still too early for HEVC.

Yes this is used by our decoder. The idea is to have all the basic
bitstream elements in the structures (even if some decoders don't use
them all) and add others for extension as separate controls later.


+struct v4l2_ctrl_hevc_slice_params {
+__u32bit_size;
+__u32data_bit_offset;
+
+/* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
+__u8nal_unit_type;
+__u8nuh_temporal_id_plus1;
+
+/* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
+__u8slice_type;
+__u8colour_plane_id;



+__u16slice_pic_order_cnt;
+__u8slice_sao_luma_flag;
+ 

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-08 Thread Ayaka


Sent from my iPad

> On Jan 8, 2019, at 4:38 PM, Paul Kocialkowski  
> wrote:
> 
> Hi,
> 
>> On Tue, 2019-01-08 at 09:16 +0800, Ayaka wrote:
>> 
>> Sent from my iPad
>> 
>>> On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski 
>>>  wrote:
>>> 
>>> Hi,
>>> 
> On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
> On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
> Hi,
> 
> On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
> 
>>> +
>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE0x01
>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER0x02
>>> +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR0x03
>>> +
>>> +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX16
>>> +
>>> +struct v4l2_hevc_dpb_entry {
>>> +__u32buffer_tag;
>>> +__u8rps;
>>> +__u8field_pic;
>>> +__u16pic_order_cnt[2];
>>> +};
 
 Please add a property for reference index, if that rps is not used for 
 this, some device would request that(not the rockchip one). And 
 Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.
>>> 
>>> What exactly is that reference index? Is it a bitstream element or
>>> something deduced from the bitstream?
>>> 
>> picture order count(POC) for HEVC and frame_num in AVC. I think it is
>> the number used in list0(P slice and B slice) and list1(B slice).
> 
> The picture order count is already the last field of the DPB entry
> structure. There is one for each field picture.
As we are not sure whether there is a field coded slice or CTU, I would hold 
this part and else about the field.
> 
 Adding another buffer_tag for referring the memory of the motion vectors 
 for each frames. Or a better method is add a meta data to echo picture 
 buffer,  since the picture output is just the same as the original, 
 display won't care whether the motion vectors are written the button of 
 picture or somewhere else.
>>> 
>>> The motion vectors are passed as part of the raw bitstream data, in the
>>> slices. Is there a case where the motion vectors are coded differently?
>> No, it is an additional cache for decoder, even FFmpeg having such
>> data, I think allwinner must output it into somewhere.
> 
> Ah yes I see what you mean! This is handled internally by our driver
> and not exposed to userspace. I don't think it would be a good idea to
> expose this cache or request that userspace allocates it like a video
> buffer.
> 
No, usually the driver should allocate, as the user space have no idea on size 
of each devices.
But for advantage user, application can fix a broken picture with a proper data 
or analysis a object motion from that.
So I would suggest attaching this information to a picture buffer as a meta 
data. 
>>> +
>>> +struct v4l2_hevc_pred_weight_table {
>>> +__u8luma_log2_weight_denom;
>>> +__s8delta_chroma_log2_weight_denom;
>>> +
>>> +__s8delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>> +__s8luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>> +__s8delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>> +__s8chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>> +
>>> +__s8delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>> +__s8luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
>>> +__s8delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>> +__s8chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
>>> +};
>>> +
 Those properties I think are not necessary are applying for the 
 Rockchip's device, may not work for the others.
>>> 
>>> Yes, it's possible that some of the elements are not necessary for some
>>> decoders. What we want is to cover all the elements that might be
>>> required for a decoder.
>> I wonder whether allwinner need that, those sao flag usually ignored
>> by decoder in design. But more is better than less, it is hard to
>> extend a v4l2 structure  in the future, maybe a new HEVC profile
>> would bring a new property, it is still too early for HEVC.
> 
> Yes this is used by our decoder. The idea is to have all the basic
> bitstream elements in the structures (even if some decoders don't use
> them all) and add others for extension as separate controls later.
> 
>>> +struct v4l2_ctrl_hevc_slice_params {
>>> +__u32bit_size;
>>> +__u32data_bit_offset;
>>> +
>>> +/* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
>>> +__u8nal_unit_type;
>>> +__u8nuh_temporal_id_plus1;
>>> +
>>> +/* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header 
>>> */
>>> +__u8slice_type;
>>> +__u8colour_plane_id;
 
>>> +__u16slice_pic_order_cnt;
>>> +__u8slice_sao_luma_flag;
>>> +

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-08 Thread Paul Kocialkowski
Hi,

On Tue, 2019-01-08 at 09:16 +0800, Ayaka wrote:
> 
> Sent from my iPad
> 
> > On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski 
> >  wrote:
> > 
> > Hi,
> > 
> > > On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
> > > > On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
> > > > Hi,
> > > > 
> > > > On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
> > > > 
> > > > > > +
> > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE0x01
> > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER0x02
> > > > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR0x03
> > > > > > +
> > > > > > +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX16
> > > > > > +
> > > > > > +struct v4l2_hevc_dpb_entry {
> > > > > > +__u32buffer_tag;
> > > > > > +__u8rps;
> > > > > > +__u8field_pic;
> > > > > > +__u16pic_order_cnt[2];
> > > > > > +};
> > > 
> > > Please add a property for reference index, if that rps is not used for 
> > > this, some device would request that(not the rockchip one). And 
> > > Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.
> > 
> > What exactly is that reference index? Is it a bitstream element or
> > something deduced from the bitstream?
> > 
> picture order count(POC) for HEVC and frame_num in AVC. I think it is
> the number used in list0(P slice and B slice) and list1(B slice).

The picture order count is already the last field of the DPB entry
structure. There is one for each field picture.

> > > Adding another buffer_tag for referring the memory of the motion vectors 
> > > for each frames. Or a better method is add a meta data to echo picture 
> > > buffer,  since the picture output is just the same as the original, 
> > > display won't care whether the motion vectors are written the button of 
> > > picture or somewhere else.
> > 
> > The motion vectors are passed as part of the raw bitstream data, in the
> > slices. Is there a case where the motion vectors are coded differently?
> No, it is an additional cache for decoder, even FFmpeg having such
> data, I think allwinner must output it into somewhere.

Ah yes I see what you mean! This is handled internally by our driver
and not exposed to userspace. I don't think it would be a good idea to
expose this cache or request that userspace allocates it like a video
buffer.

> > > > > > +
> > > > > > +struct v4l2_hevc_pred_weight_table {
> > > > > > +__u8luma_log2_weight_denom;
> > > > > > +__s8delta_chroma_log2_weight_denom;
> > > > > > +
> > > > > > +__s8delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > +__s8luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > +__s8
> > > > > > delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > +__s8chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > +
> > > > > > +__s8delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > +__s8luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > > > +__s8
> > > > > > delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > +__s8chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > > > +};
> > > > > > +
> > > Those properties I think are not necessary are applying for the 
> > > Rockchip's device, may not work for the others.
> > 
> > Yes, it's possible that some of the elements are not necessary for some
> > decoders. What we want is to cover all the elements that might be
> > required for a decoder.
> I wonder whether allwinner need that, those sao flag usually ignored
> by decoder in design. But more is better than less, it is hard to
> extend a v4l2 structure  in the future, maybe a new HEVC profile
> would bring a new property, it is still too early for HEVC.

Yes this is used by our decoder. The idea is to have all the basic
bitstream elements in the structures (even if some decoders don't use
them all) and add others for extension as separate controls later.

> > > > > > +struct v4l2_ctrl_hevc_slice_params {
> > > > > > +__u32bit_size;
> > > > > > +__u32data_bit_offset;
> > > > > > +
> > > > > > +/* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
> > > > > > +__u8nal_unit_type;
> > > > > > +__u8nuh_temporal_id_plus1;
> > > > > > +
> > > > > > +/* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment 
> > > > > > header */
> > > > > > +__u8slice_type;
> > > > > > +__u8colour_plane_id;
> > > 
> > > > > > +__u16slice_pic_order_cnt;
> > > > > > +__u8slice_sao_luma_flag;
> > > > > > +__u8slice_sao_chroma_flag;
> > > > > > +__u8slice_temporal_mvp_enabled_flag;
> > > > > > +__u8num_ref_idx_l0_active_minus1;
> > > > > > +__u8num_ref_idx_l1_active_minus1;
> > > Rockchip's decoder doesn't use this part.
> > > > > > +__u8mvd_l1_zero_flag;
> > > > > > +__u8

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-07 Thread Ayaka


Sent from my iPad

> On Jan 7, 2019, at 5:57 PM, Paul Kocialkowski  
> wrote:
> 
> Hi,
> 
>> On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
>>> On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
>>> Hi,
>>> 
>>> On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
>>> 
> +
> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE0x01
> +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER0x02
> +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR0x03
> +
> +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX16
> +
> +struct v4l2_hevc_dpb_entry {
> +__u32buffer_tag;
> +__u8rps;
> +__u8field_pic;
> +__u16pic_order_cnt[2];
> +};
>> 
>> Please add a property for reference index, if that rps is not used for 
>> this, some device would request that(not the rockchip one). And 
>> Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.
> 
> What exactly is that reference index? Is it a bitstream element or
> something deduced from the bitstream?
> 
picture order count(POC) for HEVC and frame_num in AVC. I think it is the 
number used in list0(P slice and B slice) and list1(B slice).
>> Adding another buffer_tag for referring the memory of the motion vectors 
>> for each frames. Or a better method is add a meta data to echo picture 
>> buffer,  since the picture output is just the same as the original, 
>> display won't care whether the motion vectors are written the button of 
>> picture or somewhere else.
> 
> The motion vectors are passed as part of the raw bitstream data, in the
> slices. Is there a case where the motion vectors are coded differently?
No, it is an additional cache for decoder, even FFmpeg having such data, I 
think allwinner must output it into somewhere.
> 
> +
> +struct v4l2_hevc_pred_weight_table {
> +__u8luma_log2_weight_denom;
> +__s8delta_chroma_log2_weight_denom;
> +
> +__s8delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> +__s8luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> +__s8delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> +__s8chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> +
> +__s8delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> +__s8luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> +__s8delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> +__s8chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> +};
> +
>> Those properties I think are not necessary are applying for the 
>> Rockchip's device, may not work for the others.
> 
> Yes, it's possible that some of the elements are not necessary for some
> decoders. What we want is to cover all the elements that might be
> required for a decoder.
I wonder whether allwinner need that, those sao flag usually ignored by decoder 
in design. But more is better than less, it is hard to extend a v4l2 structure  
in the future, maybe a new HEVC profile would bring a new property, it is still 
too early for HEVC.
> 
> +struct v4l2_ctrl_hevc_slice_params {
> +__u32bit_size;
> +__u32data_bit_offset;
> +
> +/* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
> +__u8nal_unit_type;
> +__u8nuh_temporal_id_plus1;
> +
> +/* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
> +__u8slice_type;
> +__u8colour_plane_id;
>> 
> +__u16slice_pic_order_cnt;
> +__u8slice_sao_luma_flag;
> +__u8slice_sao_chroma_flag;
> +__u8slice_temporal_mvp_enabled_flag;
> +__u8num_ref_idx_l0_active_minus1;
> +__u8num_ref_idx_l1_active_minus1;
>> Rockchip's decoder doesn't use this part.
> +__u8mvd_l1_zero_flag;
> +__u8cabac_init_flag;
> +__u8collocated_from_l0_flag;
> +__u8collocated_ref_idx;
> +__u8five_minus_max_num_merge_cand;
> +__u8use_integer_mv_flag;
> +__s8slice_qp_delta;
> +__s8slice_cb_qp_offset;
> +__s8slice_cr_qp_offset;
> +__s8slice_act_y_qp_offset;
> +__s8slice_act_cb_qp_offset;
> +__s8slice_act_cr_qp_offset;
> +__u8slice_deblocking_filter_disabled_flag;
> +__s8slice_beta_offset_div2;
> +__s8slice_tc_offset_div2;
> +__u8slice_loop_filter_across_slices_enabled_flag;
> +
> +/* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
> +__u8pic_struct;
>> I think the decoder doesn't care about this, it is used for display.
> 
> The purpose of this field is to indicate whether the current picture is
> a progressive frame or an interlaced field picture, which is useful for
> decoding.
> 
> At least our decoder has a register 

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-07 Thread Paul Kocialkowski
Hi,

On Mon, 2019-01-07 at 11:49 +0800, Randy Li wrote:
> On 12/12/18 8:51 PM, Paul Kocialkowski wrote:
> > Hi,
> > 
> > On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
> > 
> > > > +
> > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
> > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER  0x02
> > > > +#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR0x03
> > > > +
> > > > +#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX  16
> > > > +
> > > > +struct v4l2_hevc_dpb_entry {
> > > > +   __u32   buffer_tag;
> > > > +   __u8rps;
> > > > +   __u8field_pic;
> > > > +   __u16   pic_order_cnt[2];
> > > > +};
> 
> Please add a property for reference index, if that rps is not used for 
> this, some device would request that(not the rockchip one). And 
> Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.

What exactly is that reference index? Is it a bitstream element or
something deduced from the bitstream?

> Adding another buffer_tag for referring the memory of the motion vectors 
> for each frames. Or a better method is add a meta data to echo picture 
> buffer,  since the picture output is just the same as the original, 
> display won't care whether the motion vectors are written the button of 
> picture or somewhere else.

The motion vectors are passed as part of the raw bitstream data, in the
slices. Is there a case where the motion vectors are coded differently?

> > > > +
> > > > +struct v4l2_hevc_pred_weight_table {
> > > > +   __u8luma_log2_weight_denom;
> > > > +   __s8delta_chroma_log2_weight_denom;
> > > > +
> > > > +   __s8delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > +   __s8luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > +   __s8
> > > > delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > +   __s8chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > +
> > > > +   __s8delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > +   __s8luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
> > > > +   __s8
> > > > delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > +   __s8chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
> > > > +};
> > > > +
> Those properties I think are not necessary are applying for the 
> Rockchip's device, may not work for the others.

Yes, it's possible that some of the elements are not necessary for some
decoders. What we want is to cover all the elements that might be
required for a decoder.

> > > > +struct v4l2_ctrl_hevc_slice_params {
> > > > +   __u32   bit_size;
> > > > +   __u32   data_bit_offset;
> > > > +
> > > > +   /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
> > > > +   __u8nal_unit_type;
> > > > +   __u8nuh_temporal_id_plus1;
> > > > +
> > > > +   /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment 
> > > > header */
> > > > +   __u8slice_type;
> > > > +   __u8colour_plane_id;
> 
> > > > +   __u16   slice_pic_order_cnt;
> > > > +   __u8slice_sao_luma_flag;
> > > > +   __u8slice_sao_chroma_flag;
> > > > +   __u8slice_temporal_mvp_enabled_flag;
> > > > +   __u8num_ref_idx_l0_active_minus1;
> > > > +   __u8num_ref_idx_l1_active_minus1;
> Rockchip's decoder doesn't use this part.
> > > > +   __u8mvd_l1_zero_flag;
> > > > +   __u8cabac_init_flag;
> > > > +   __u8collocated_from_l0_flag;
> > > > +   __u8collocated_ref_idx;
> > > > +   __u8five_minus_max_num_merge_cand;
> > > > +   __u8use_integer_mv_flag;
> > > > +   __s8slice_qp_delta;
> > > > +   __s8slice_cb_qp_offset;
> > > > +   __s8slice_cr_qp_offset;
> > > > +   __s8slice_act_y_qp_offset;
> > > > +   __s8slice_act_cb_qp_offset;
> > > > +   __s8slice_act_cr_qp_offset;
> > > > +   __u8slice_deblocking_filter_disabled_flag;
> > > > +   __s8slice_beta_offset_div2;
> > > > +   __s8slice_tc_offset_div2;
> > > > +   __u8slice_loop_filter_across_slices_enabled_flag;
> > > > +
> > > > +   /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI 
> > > > message */
> > > > +   __u8pic_struct;
> I think the decoder doesn't care about this, it is used for display.

The purpose of this field is to indicate whether the current picture is
a progressive frame or an interlaced field picture, which is useful for
decoding.

At least our decoder has a register field to indicate frame/top
field/bottom field, so we certainly need to keep the info around.
Looking at the spec and the ffmpeg implementation, it looks like this
flag of the bitstream is the usual way to report field coding.

Cheers,

Paul

> > > > +
> > > > +   /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment 
> > > > header */
> > > > +   

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2019-01-06 Thread Randy Li


On 12/12/18 8:51 PM, Paul Kocialkowski wrote:

Hi,

On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:


+
+#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_BEFORE 0x01
+#define V4L2_HEVC_DPB_ENTRY_RPS_ST_CURR_AFTER  0x02
+#define V4L2_HEVC_DPB_ENTRY_RPS_LT_CURR0x03
+
+#define V4L2_HEVC_DPB_ENTRIES_NUM_MAX  16
+
+struct v4l2_hevc_dpb_entry {
+   __u32   buffer_tag;
+   __u8rps;
+   __u8field_pic;
+   __u16   pic_order_cnt[2];
+};


Please add a property for reference index, if that rps is not used for 
this, some device would request that(not the rockchip one). And 
Rockchip's VDPU1 and VDPU2 for AVC would request a similar property.


Adding another buffer_tag for referring the memory of the motion vectors 
for each frames. Or a better method is add a meta data to echo picture 
buffer,  since the picture output is just the same as the original, 
display won't care whether the motion vectors are written the button of 
picture or somewhere else.




+
+struct v4l2_hevc_pred_weight_table {
+   __u8luma_log2_weight_denom;
+   __s8delta_chroma_log2_weight_denom;
+
+   __s8delta_luma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+   __s8luma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+   __s8delta_chroma_weight_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
+   __s8chroma_offset_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
+
+   __s8delta_luma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+   __s8luma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+   __s8delta_chroma_weight_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
+   __s8chroma_offset_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX][2];
+};
+
Those properties I think are not necessary are applying for the 
Rockchip's device, may not work for the others.

+struct v4l2_ctrl_hevc_slice_params {
+   __u32   bit_size;
+   __u32   data_bit_offset;
+
+   /* ISO/IEC 23008-2, ITU-T Rec. H.265: NAL unit header */
+   __u8nal_unit_type;
+   __u8nuh_temporal_id_plus1;
+
+   /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
+   __u8slice_type;
+   __u8colour_plane_id;



+   __u16   slice_pic_order_cnt;
+   __u8slice_sao_luma_flag;
+   __u8slice_sao_chroma_flag;
+   __u8slice_temporal_mvp_enabled_flag;
+   __u8num_ref_idx_l0_active_minus1;
+   __u8num_ref_idx_l1_active_minus1;

Rockchip's decoder doesn't use this part.

+   __u8mvd_l1_zero_flag;
+   __u8cabac_init_flag;
+   __u8collocated_from_l0_flag;
+   __u8collocated_ref_idx;
+   __u8five_minus_max_num_merge_cand;
+   __u8use_integer_mv_flag;
+   __s8slice_qp_delta;
+   __s8slice_cb_qp_offset;
+   __s8slice_cr_qp_offset;
+   __s8slice_act_y_qp_offset;
+   __s8slice_act_cb_qp_offset;
+   __s8slice_act_cr_qp_offset;
+   __u8slice_deblocking_filter_disabled_flag;
+   __s8slice_beta_offset_div2;
+   __s8slice_tc_offset_div2;
+   __u8slice_loop_filter_across_slices_enabled_flag;
+
+   /* ISO/IEC 23008-2, ITU-T Rec. H.265: Picture timing SEI message */
+   __u8pic_struct;

I think the decoder doesn't care about this, it is used for display.

+
+   /* ISO/IEC 23008-2, ITU-T Rec. H.265: General slice segment header */
+   struct v4l2_hevc_dpb_entry dpb[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+   __u8num_active_dpb_entries;
+   __u8ref_idx_l0[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+   __u8ref_idx_l1[V4L2_HEVC_DPB_ENTRIES_NUM_MAX];
+
+   __u8num_rps_poc_st_curr_before;
+   __u8num_rps_poc_st_curr_after;
+   __u8num_rps_poc_lt_curr;
+
+   /* ISO/IEC 23008-2, ITU-T Rec. H.265: Weighted prediction parameter */
+   struct v4l2_hevc_pred_weight_table pred_weight_table;
+};
+
  #endif




___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2018-12-12 Thread Paul Kocialkowski
Hi,

On Wed, 2018-12-05 at 21:59 +0100, Jernej Škrabec wrote:
> Hi!
> 
> Dne petek, 23. november 2018 ob 14:02:08 CET je Paul Kocialkowski napisal(a):
> > This introduces the required definitions for HEVC decoding support with
> > stateless VPUs. The controls associated to the HEVC slice format provide
> > the required meta-data for decoding slices extracted from the bitstream.
> > 
> > This interface comes with the following limitations:
> > * No custom quantization matrices (scaling lists);
> > * Support for a single temporal layer only;
> > * No slice entry point offsets support;
> > * No conformance window support;
> > * No VUI parameters support;
> > * No support for SPS extensions: range, multilayer, 3d, scc, 4 bits;
> > * No support for PPS extensions: range, multilayer, 3d, scc, 4 bits.
> > 
> > Signed-off-by: Paul Kocialkowski 
> > ---
> 
> 
> 
> > diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c
> > b/drivers/media/v4l2-core/v4l2-ctrls.c index e96c453208e8..9af17815ecc3
> > 100644
> > --- a/drivers/media/v4l2-core/v4l2-ctrls.c
> > +++ b/drivers/media/v4l2-core/v4l2-ctrls.c
> > @@ -913,6 +913,9 @@ const char *v4l2_ctrl_get_name(u32 id)
> > case V4L2_CID_MPEG_VIDEO_HEVC_SIZE_OF_LENGTH_FIELD: return "HEVC 
> > Size of
> > Length Field"; case V4L2_CID_MPEG_VIDEO_REF_NUMBER_FOR_PFRAMES: return
> > "Reference Frames for a P-Frame"; case
> > V4L2_CID_MPEG_VIDEO_PREPEND_SPSPPS_TO_IDR:  return "Prepend SPS and 
> > PPS 
> to
> > IDR"; + case V4L2_CID_MPEG_VIDEO_HEVC_SPS:  return 
> > "HEVC Sequence
> > Parameter Set"; +   case V4L2_CID_MPEG_VIDEO_HEVC_PPS:  
> > return "HEVC 
> Picture
> > Parameter Set"; +   case V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS: 
> > return 
> "HEVC
> > Slice Parameters";
> > 
> > /* CAMERA controls */
> > /* Keep the order of the 'case's the same as in v4l2-controls.h! */
> > @@ -1320,6 +1323,15 @@ void v4l2_ctrl_fill(u32 id, const char **name, enum
> > v4l2_ctrl_type *type, case V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS:
> > *type = V4L2_CTRL_TYPE_H264_DECODE_PARAMS;
> > break;
> > +   case V4L2_CID_MPEG_VIDEO_HEVC_SPS:
> > +   *type = V4L2_CTRL_TYPE_HEVC_SPS;
> > +   break;
> > +   case V4L2_CID_MPEG_VIDEO_HEVC_PPS:
> > +   *type = V4L2_CTRL_TYPE_HEVC_PPS;
> > +   break;
> > +   case V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS:
> > +   *type = V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS;
> > +   break;
> > default:
> > *type = V4L2_CTRL_TYPE_INTEGER;
> > break;
> > @@ -1692,6 +1704,11 @@ static int std_validate(const struct v4l2_ctrl *ctrl,
> > u32 idx, case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
> > return 0;
> > 
> > +   case V4L2_CTRL_TYPE_HEVC_SPS:
> > +   case V4L2_CTRL_TYPE_HEVC_PPS:
> > +   case V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS:
> > +   return 0;
> > +
> > default:
> > return -EINVAL;
> > }
> > @@ -2287,6 +2304,15 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct
> > v4l2_ctrl_handler *hdl, case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
> > elem_size = sizeof(struct v4l2_ctrl_h264_decode_param);
> > break;
> > +   case V4L2_CTRL_TYPE_HEVC_SPS:
> > +   elem_size = sizeof(struct v4l2_ctrl_hevc_sps);
> > +   break;
> > +   case V4L2_CTRL_TYPE_HEVC_PPS:
> > +   elem_size = sizeof(struct v4l2_ctrl_hevc_pps);
> > +   break;
> > +   case V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS:
> > +   elem_size = sizeof(struct v4l2_ctrl_hevc_slice_params);
> > +   break;
> > default:
> > if (type < V4L2_CTRL_COMPOUND_TYPES)
> > elem_size = sizeof(s32);
> > diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c
> > b/drivers/media/v4l2-core/v4l2-ioctl.c index aa63f1794272..7bec91c6effe
> > 100644
> > --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> > +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> > @@ -1321,6 +1321,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
> > case V4L2_PIX_FMT_VP8:  descr = "VP8"; break;
> > case V4L2_PIX_FMT_VP9:  descr = "VP9"; break;
> > case V4L2_PIX_FMT_HEVC: descr = "HEVC"; break; /* aka 
> > H.265 */
> > +   case V4L2_PIX_FMT_HEVC_SLICE:   descr = "HEVC Parsed Slice 
> > Data"; break;
> > case V4L2_PIX_FMT_FWHT: descr = "FWHT"; break; /* used 
> > in vicodec */
> > case V4L2_PIX_FMT_CPIA1:descr = "GSPCA CPiA YUV"; break;
> > case V4L2_PIX_FMT_WNVA: descr = "WNVA"; break;
> > diff --git a/include/media/v4l2-ctrls.h b/include/media/v4l2-ctrls.h
> > index b4ca95710d2d..11664c5c3706 100644
> > --- a/include/media/v4l2-ctrls.h
> > +++ b/include/media/v4l2-ctrls.h
> > @@ -48,6 +48,9 @@ struct poll_table_struct;
> >   * @p_h264_scal_mtrx:  Pointer to a struct 
> v4l2_ctrl_h264_scaling_matrix.
> >   * @p_h264_slice_param:Pointer to a struct 

Re: [linux-sunxi] [PATCH v2 1/2] media: v4l: Add definitions for the HEVC slice format and controls

2018-12-05 Thread Jernej Škrabec
Hi!

Dne petek, 23. november 2018 ob 14:02:08 CET je Paul Kocialkowski napisal(a):
> This introduces the required definitions for HEVC decoding support with
> stateless VPUs. The controls associated to the HEVC slice format provide
> the required meta-data for decoding slices extracted from the bitstream.
> 
> This interface comes with the following limitations:
> * No custom quantization matrices (scaling lists);
> * Support for a single temporal layer only;
> * No slice entry point offsets support;
> * No conformance window support;
> * No VUI parameters support;
> * No support for SPS extensions: range, multilayer, 3d, scc, 4 bits;
> * No support for PPS extensions: range, multilayer, 3d, scc, 4 bits.
> 
> Signed-off-by: Paul Kocialkowski 
> ---



> diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c
> b/drivers/media/v4l2-core/v4l2-ctrls.c index e96c453208e8..9af17815ecc3
> 100644
> --- a/drivers/media/v4l2-core/v4l2-ctrls.c
> +++ b/drivers/media/v4l2-core/v4l2-ctrls.c
> @@ -913,6 +913,9 @@ const char *v4l2_ctrl_get_name(u32 id)
>   case V4L2_CID_MPEG_VIDEO_HEVC_SIZE_OF_LENGTH_FIELD: return "HEVC 
> Size of
> Length Field"; case V4L2_CID_MPEG_VIDEO_REF_NUMBER_FOR_PFRAMES:   return
> "Reference Frames for a P-Frame"; case
> V4L2_CID_MPEG_VIDEO_PREPEND_SPSPPS_TO_IDR:return "Prepend SPS and 
> PPS 
to
> IDR"; +   case V4L2_CID_MPEG_VIDEO_HEVC_SPS:  return 
> "HEVC Sequence
> Parameter Set"; + case V4L2_CID_MPEG_VIDEO_HEVC_PPS:  
> return "HEVC 
Picture
> Parameter Set"; + case V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS: 
> return 
"HEVC
> Slice Parameters";
> 
>   /* CAMERA controls */
>   /* Keep the order of the 'case's the same as in v4l2-controls.h! */
> @@ -1320,6 +1323,15 @@ void v4l2_ctrl_fill(u32 id, const char **name, enum
> v4l2_ctrl_type *type, case V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAMS:
>   *type = V4L2_CTRL_TYPE_H264_DECODE_PARAMS;
>   break;
> + case V4L2_CID_MPEG_VIDEO_HEVC_SPS:
> + *type = V4L2_CTRL_TYPE_HEVC_SPS;
> + break;
> + case V4L2_CID_MPEG_VIDEO_HEVC_PPS:
> + *type = V4L2_CTRL_TYPE_HEVC_PPS;
> + break;
> + case V4L2_CID_MPEG_VIDEO_HEVC_SLICE_PARAMS:
> + *type = V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS;
> + break;
>   default:
>   *type = V4L2_CTRL_TYPE_INTEGER;
>   break;
> @@ -1692,6 +1704,11 @@ static int std_validate(const struct v4l2_ctrl *ctrl,
> u32 idx, case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
>   return 0;
> 
> + case V4L2_CTRL_TYPE_HEVC_SPS:
> + case V4L2_CTRL_TYPE_HEVC_PPS:
> + case V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS:
> + return 0;
> +
>   default:
>   return -EINVAL;
>   }
> @@ -2287,6 +2304,15 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct
> v4l2_ctrl_handler *hdl, case V4L2_CTRL_TYPE_H264_DECODE_PARAMS:
>   elem_size = sizeof(struct v4l2_ctrl_h264_decode_param);
>   break;
> + case V4L2_CTRL_TYPE_HEVC_SPS:
> + elem_size = sizeof(struct v4l2_ctrl_hevc_sps);
> + break;
> + case V4L2_CTRL_TYPE_HEVC_PPS:
> + elem_size = sizeof(struct v4l2_ctrl_hevc_pps);
> + break;
> + case V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS:
> + elem_size = sizeof(struct v4l2_ctrl_hevc_slice_params);
> + break;
>   default:
>   if (type < V4L2_CTRL_COMPOUND_TYPES)
>   elem_size = sizeof(s32);
> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c
> b/drivers/media/v4l2-core/v4l2-ioctl.c index aa63f1794272..7bec91c6effe
> 100644
> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> @@ -1321,6 +1321,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
> case V4L2_PIX_FMT_VP8:descr = "VP8"; break;
>   case V4L2_PIX_FMT_VP9:  descr = "VP9"; break;
>   case V4L2_PIX_FMT_HEVC: descr = "HEVC"; break; /* aka 
> H.265 */
> + case V4L2_PIX_FMT_HEVC_SLICE:   descr = "HEVC Parsed Slice 
> Data"; break;
>   case V4L2_PIX_FMT_FWHT: descr = "FWHT"; break; /* used 
> in vicodec */
>   case V4L2_PIX_FMT_CPIA1:descr = "GSPCA CPiA YUV"; break;
>   case V4L2_PIX_FMT_WNVA: descr = "WNVA"; break;
> diff --git a/include/media/v4l2-ctrls.h b/include/media/v4l2-ctrls.h
> index b4ca95710d2d..11664c5c3706 100644
> --- a/include/media/v4l2-ctrls.h
> +++ b/include/media/v4l2-ctrls.h
> @@ -48,6 +48,9 @@ struct poll_table_struct;
>   * @p_h264_scal_mtrx:Pointer to a struct 
v4l2_ctrl_h264_scaling_matrix.
>   * @p_h264_slice_param:  Pointer to a struct 
> v4l2_ctrl_h264_slice_param.
>   * @p_h264_decode_param: Pointer to a struct v4l2_ctrl_h264_decode_param.
> + * @p_hevc_sps:  Pointer to an HEVC sequence