Re: New DVB-Statistics API - please vote

2010-02-03 Thread Mauro Carvalho Chehab
Mauro Carvalho Chehab wrote:

 after the last thread which asked about signal statistics details
 degenerated into a discussion about the technical possibilites for
 implementing an entirely new API, which lead to nothing so far, I wanted
 to open a new thread to bring this forward. Maybe some more people can
 give their votes for the different options

Only me and Manu manifested with opinions on this thread. Not sure why
nobody else gave their comments. Maybe all interested people just decided 
to take a long vacation and are not listening to their emails ;)

Seriously, from what I understand, this is an API improvement and we need
to take a decision on it. So, your opinions are important.

---

Let me draw a summary of this subject, trying to be impartial.

The original proposal were made by Manu. My proposal is derived from Manu's
original one, both will equally provide the same features. 

The main difference is that Manu's proposal use a struct to get the 
statistics while my proposal uses DVBS2API.

With both API proposals, all values are get at the same time by the driver.
the DVBS2API adds a small delay to fill the fields, but the extra delay is
insignificant, when compared with the I/O delays needed to retrieve the 
values from the hardware.

Due to the usage of DVBS2API, it is possible to retrieve a subset of statistics.
When obtaining a subset, the DVBS2API latency is considerable faster, as less
data needed to be transfered from the hardware.

The DVBS2API also offers the possibility of expanding the statistics group, 
since
it is not rigid as an struct.

One criteria that should be reminded is that, according with Linux Kernel rules,
any userspace API is stable. This means that applications compiled against an
older API version should keep running with the latest kernel. So, whatever 
decided,
the statistics API should always maintain backward compatibility.

---

During the end of the year, I did some work with an ISDB-T driver for Siano, and
I realized that the usage of the proposed struct won't fit well for ISDB-T. The
reason is that, on ISDB-T, the transmission uses up to 3 hierarchical layers.
Each layer may have different OFDM parameters, so the devices (at least, this 
is the 
case for Siano) has a group of statistics per layer.

I'm afraid that newer standards may also bring different ways to present 
statistics, and
the current proposal won't fit well.

So, in my opinion, if it is chosen any struct-based approach, we'll have a bad 
time to
maintain it, as it won't fit into all cases, and we'll need to add some tricks 
to extend
the struct.

So, my vote is for the DVBS2API approach, where a new group of statistics can 
easily be added,
on an elegant way, without needing of re-discussing the better API or to find a 
way to extend
the size of an struct without breaking backward compatibility.

Cheers,
Mauro
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New DVB-Statistics API - please vote

2010-02-03 Thread Hans Verkuil

 Mauro Carvalho Chehab wrote:

 after the last thread which asked about signal statistics details
 degenerated into a discussion about the technical possibilites for
 implementing an entirely new API, which lead to nothing so far, I
 wanted
 to open a new thread to bring this forward. Maybe some more people can
 give their votes for the different options

 Only me and Manu manifested with opinions on this thread. Not sure why
 nobody else gave their comments. Maybe all interested people just decided
 to take a long vacation and are not listening to their emails ;)

 Seriously, from what I understand, this is an API improvement and we need
 to take a decision on it. So, your opinions are important.

 ---

 Let me draw a summary of this subject, trying to be impartial.

 The original proposal were made by Manu. My proposal is derived from
 Manu's
 original one, both will equally provide the same features.

 The main difference is that Manu's proposal use a struct to get the
 statistics while my proposal uses DVBS2API.

 With both API proposals, all values are get at the same time by the
 driver.
 the DVBS2API adds a small delay to fill the fields, but the extra delay is
 insignificant, when compared with the I/O delays needed to retrieve the
 values from the hardware.

 Due to the usage of DVBS2API, it is possible to retrieve a subset of
 statistics.
 When obtaining a subset, the DVBS2API latency is considerable faster, as
 less
 data needed to be transfered from the hardware.

 The DVBS2API also offers the possibility of expanding the statistics
 group, since
 it is not rigid as an struct.

 One criteria that should be reminded is that, according with Linux Kernel
 rules,
 any userspace API is stable. This means that applications compiled against
 an
 older API version should keep running with the latest kernel. So, whatever
 decided,
 the statistics API should always maintain backward compatibility.

 ---

 During the end of the year, I did some work with an ISDB-T driver for
 Siano, and
 I realized that the usage of the proposed struct won't fit well for
 ISDB-T. The
 reason is that, on ISDB-T, the transmission uses up to 3 hierarchical
 layers.
 Each layer may have different OFDM parameters, so the devices (at least,
 this is the
 case for Siano) has a group of statistics per layer.

 I'm afraid that newer standards may also bring different ways to present
 statistics, and
 the current proposal won't fit well.

 So, in my opinion, if it is chosen any struct-based approach, we'll have a
 bad time to
 maintain it, as it won't fit into all cases, and we'll need to add some
 tricks to extend
 the struct.

 So, my vote is for the DVBS2API approach, where a new group of statistics
 can easily be added,
 on an elegant way, without needing of re-discussing the better API or to
 find a way to extend
 the size of an struct without breaking backward compatibility.

From a purely technical standpoint the DVBS2API is by definition more
flexible and future-proof. The V4L API has taken the same approach with
controls (basically exactly the same mechanism). Should it be necessary in
the future to set multiple properties atomically, then the same technique
as V4L can be used (see VIDIOC_S_EXT_CTRLS).

The alternative is to make structs with lots of reserved fields. It
depends on how predictable the API is expected to be. Sometimes you can be
reasonably certain that there won't be too many additions in the future
and then using reserved fields is perfectly OK.

Just my 5 cents based on my V4L experience.

Regards,

  Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG Telecom

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New DVB-Statistics API - please vote

2010-02-03 Thread Manu Abraham
On Wed, Feb 3, 2010 at 2:40 PM, Hans Verkuil hverk...@xs4all.nl wrote:

 Mauro Carvalho Chehab wrote:

 after the last thread which asked about signal statistics details
 degenerated into a discussion about the technical possibilites for
 implementing an entirely new API, which lead to nothing so far, I
 wanted
 to open a new thread to bring this forward. Maybe some more people can
 give their votes for the different options

 Only me and Manu manifested with opinions on this thread. Not sure why
 nobody else gave their comments. Maybe all interested people just decided
 to take a long vacation and are not listening to their emails ;)

 Seriously, from what I understand, this is an API improvement and we need
 to take a decision on it. So, your opinions are important.

 ---

 Let me draw a summary of this subject, trying to be impartial.

 The original proposal were made by Manu. My proposal is derived from
 Manu's
 original one, both will equally provide the same features.

 The main difference is that Manu's proposal use a struct to get the
 statistics while my proposal uses DVBS2API.

 With both API proposals, all values are get at the same time by the
 driver.
 the DVBS2API adds a small delay to fill the fields, but the extra delay is
 insignificant, when compared with the I/O delays needed to retrieve the
 values from the hardware.

 Due to the usage of DVBS2API, it is possible to retrieve a subset of
 statistics.
 When obtaining a subset, the DVBS2API latency is considerable faster, as
 less
 data needed to be transfered from the hardware.

 The DVBS2API also offers the possibility of expanding the statistics
 group, since
 it is not rigid as an struct.

 One criteria that should be reminded is that, according with Linux Kernel
 rules,
 any userspace API is stable. This means that applications compiled against
 an
 older API version should keep running with the latest kernel. So, whatever
 decided,
 the statistics API should always maintain backward compatibility.

 ---

 During the end of the year, I did some work with an ISDB-T driver for
 Siano, and
 I realized that the usage of the proposed struct won't fit well for
 ISDB-T. The
 reason is that, on ISDB-T, the transmission uses up to 3 hierarchical
 layers.
 Each layer may have different OFDM parameters, so the devices (at least,
 this is the
 case for Siano) has a group of statistics per layer.

 I'm afraid that newer standards may also bring different ways to present
 statistics, and
 the current proposal won't fit well.

 So, in my opinion, if it is chosen any struct-based approach, we'll have a
 bad time to
 maintain it, as it won't fit into all cases, and we'll need to add some
 tricks to extend
 the struct.

 So, my vote is for the DVBS2API approach, where a new group of statistics
 can easily be added,
 on an elegant way, without needing of re-discussing the better API or to
 find a way to extend
 the size of an struct without breaking backward compatibility.

 From a purely technical standpoint the DVBS2API is by definition more
 flexible and future-proof. The V4L API has taken the same approach with
 controls (basically exactly the same mechanism). Should it be necessary in
 the future to set multiple properties atomically, then the same technique
 as V4L can be used (see VIDIOC_S_EXT_CTRLS).

 The alternative is to make structs with lots of reserved fields. It
 depends on how predictable the API is expected to be. Sometimes you can be
 reasonably certain that there won't be too many additions in the future
 and then using reserved fields is perfectly OK.

 Just my 5 cents based on my V4L experience.


In fact, I don't see the advantage using a specific get/set, since
what i proposed just reads back basic data types such as a u32 for any
instance and those being standard  for any digital communication
system. It makes no sense to go for a get/set approach. For example
there cannot be multiple properties for any digital system such as
BER, just that there are different measure s such as kilograms,
pounds, grams etc. Which is what the whole patch is meant to do in a
performance critical manner.
The whole patch is a kind of get/set and and hence it doesn't need a
get/set at the micro level.

To contradict the reverse, as an example, I can state that weight is
not measured in centimeters or inches or feet, to put it in laymans
terms. if you say that it needs to be expressed as such,  then i very
well see that there is something conceptually wrong in your thought.

My 2 cents, for those who don't understand the issue.

Regards,
Manu
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New DVB-Statistics API

2009-12-09 Thread Manu Abraham
On Wed, Dec 9, 2009 at 3:43 AM, Mauro Carvalho Chehab
mche...@redhat.com wrote:
 Manu Abraham wrote:

 Not true. As pointed at the previous answer, the difference between a new 
 ioctl
 and S2API is basically the code at dtv_property_prepare_get_stats() and
 dtv_property_process_get(). This is a pure code that uses a continuous 
 struct
 that will likely be at L3 cache, inside the CPU chip. So, this code will run
 really quickly.



 AFAIK, cache eviction occurs with syscalls: where content in the
 caches near the CPU cores is pushed to the outer cores, resulting in
 cache misses. Not all CPU's are equipped with L3 caches. Continuous
 syscalls can result in TLB cache misses from what i understand, which
 is expensive.

 It is very likely that the contents of struct fe to go into the cache during 
 the
 syscall. I was conservative when I talked about L3. Depending on the cache 
 sizes,
 I won't doubt that the needed fields of the fe struct will go to L1 cache.



Ah, so the data structure which is there in the ioctl approach as well
and less likely to get cache hits since the calls are lesser.


 As current CPU's runs at the order of Teraflops (as the CPU clocks are at 
 gigahertz
 order, and CPU's can handle multiple instructions per clock cycle), the 
 added delay
 is in de order of nanosseconds.


 Consider STB's where DVB is generally deployed rather than the small
 segment of home users running a stack on a generic PC.

 Even with STB, let's assume a very slow cpu that runs at 100 
 Megabytes/second. So, the clock
 speed is 10 nanoseconds. Assuming that this CPU doesn't have a good pipeline, 
 being
 capable of handling only one instruction per second, you'll have one 
 instruction at executed
 at each 10 nanoseconds (as a reference, a Pentium 1, running at 133 Mbps is 
 faster than this).

Incorrect.
A CPU doesn't execute instruction per clock cycle. Clock cycles
required to execute an instruction do vary from 2 cycles 12 cycles
varying from CPU to CPU.


 An I/O operation at i2c is of the order of 10^-3. Assuming that an additional 
 delay of 10%
 (10 ^ -4) will deadly affect realtime capability (with it is very doubtful), 
 this means that
 the CPU can run up to 10,000 (!!!) instructions to generate such delay. If 
 you compile that code
 and check the number or extra instructions I bet it will be shorter enough to 
 not cause any
 practical effect.

 So, even on such bad hardware that is at least 20x slower than a netbook 
 running at 1Gbps,
 what determines the delay is the amount of I/O you're doing, and not the 
 number of extra
 code.


The I/O overhead required to read 4 registers from hardware is the
same whether you use the ioctl approach or s2api.


   Hardware I/O is the most expensive operation involved.

 True. That's what I said.

 Case #1: the ioctl approach
code stripped

 Now Case #2: based on s2api
code stripped

 Now that we can see the actual code flow, we can see the s2api
 approach requires an unnecessary large tokenizer/serializer, involving
 multiple function calls.

 Are you seeing there 10.000 assembler instructions or so? If not, the size of 
 the code is
 not relevant.

 Also: gcc optimizer transforms switches into a binary tree. So, if you have 64
 cases on switch, it will require 7 comparations (log2(64)) to get a match.

 For example, a quick look at the code you've presented, let's just calculate
 the number of operations on the dtv_property_proccess_get() routine, without
 debug stuff:

 static int dtv_property_process_get() {
CMP (if fe-ops.get_property)
CMP (if r  0)    This if only happens if the 
 first one is executed. On my patch, it is not executed
(the code you posted is the 
 one before my patch)
SWITCH (7 CMP's)  due to binary tree optimization 
 done by gcc
MOV
 }

 So, that entire code (that has about 200 lines) has, in fact
 9 comparations and one move instruction.

 At dtv_property_prepare_get_stats(), the code is even cheaper: just a switch 
 with 8
 elements (log2(8) = 3), so 3 comparations, and one move instruction.

 The additional cost on dvb_frontend_ioctl_properties is:
2 MOVs
One loop calling dtv_property_prepare_get_stats() - up to 4 times to 
 retrieve
 all quality values
one INC
one CMP and function CALL (the same cost exists also with the struct)
One loop calling dtv_property_get_stats() - up to 4 times to retrieve
 all quality values

 So, if I've calculated it right, we're talking about 2+1+16+1+2+1+40 = 63 
 instructions.

 2) the userspace-kernelspace payload.

 Case #1: The size of S2API structs. It will range from 24 to 84 (depending on 
 what
 you want to get, from one to 4 different value pairs).

 Case #2: The size of the ioctl struct: about 30 bytes (If I summed the size 
 of all structs correctly).

 payload of S2API is generally bigger, except if just one 

Re: New DVB-Statistics API

2009-12-09 Thread Mauro Carvalho Chehab
Manu Abraham wrote:
 On Wed, Dec 9, 2009 at 3:43 AM, Mauro Carvalho Chehab
 mche...@redhat.com wrote:

 Even with STB, let's assume a very slow cpu that runs at 100 
 Megabytes/second. So, the clock
 speed is 10 nanoseconds. Assuming that this CPU doesn't have a good 
 pipeline, being
 capable of handling only one instruction per second, you'll have one 
 instruction at executed
 at each 10 nanoseconds (as a reference, a Pentium 1, running at 133 Mbps is 
 faster than this).
 
 Incorrect.
 A CPU doesn't execute instruction per clock cycle. Clock cycles
 required to execute an instruction do vary from 2 cycles 12 cycles
 varying from CPU to CPU.

See the description of an old Pentium MMX processor (the sucessor of i586, 
running up to 200 MHz):
http://www.intel.com/design/archives/processors/mmx/docs/243185.htm

Thanks to superscalar architecture, it runs 2 instructions per clock cycle 
(IPC).

Newer processors can run more instructions per clock cycle. For example, any 
Pentium-4 processor,
can do 3 IPC:
http://www.intel.com/support/processors/pentium4/sb/CS-017371.htm

 So, even on such bad hardware that is at least 20x slower than a netbook 
 running at 1Gbps,
 what determines the delay is the amount of I/O you're doing, and not the 
 number of extra
 code.
 
 
 The I/O overhead required to read 4 registers from hardware is the
 same whether you use the ioctl approach or s2api.

It seems you got my point. What will determinate the delay is the number of 
I/O's, and not the
amount of instructions.
 
 Eventually, as you have pointed out yourself, The data struct will be
 in the cache all the time for the ioctl approach. The only new
 addition to the existing API in the ioctl case is a CALL instruction
 as compared to the numerous instructions in comparison to that you
 have pointed out as with the s2api approach.

True, but, as shown, the additional delay introduced by the code is less than 
0.01%, even on
a processor that has half of the speed of a 12-year old very slow CPU (a 
Pentium MMX @ 100 MHz
is capable of 2 IPC. My calculus assumed 1 IPC).

So, what will affect the delay is the number of I/O you need to do.

To get all data that the ioctl approach struct has, the delay for S2API will be 
equal.
To get less data, S2API will have a small delay.

Cheers,
Mauro.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New DVB-Statistics API

2009-12-09 Thread Julian Scheel

Am 09.12.09 14:02, schrieb Mauro Carvalho Chehab:

Manu Abraham wrote:
   

On Wed, Dec 9, 2009 at 3:43 AM, Mauro Carvalho Chehab
mche...@redhat.com  wrote:
 
   

Even with STB, let's assume a very slow cpu that runs at 100 Megabytes/second. 
So, the clock
speed is 10 nanoseconds. Assuming that this CPU doesn't have a good pipeline, 
being
capable of handling only one instruction per second, you'll have one 
instruction at executed
at each 10 nanoseconds (as a reference, a Pentium 1, running at 133 Mbps is 
faster than this).
   

Incorrect.
A CPU doesn't execute instruction per clock cycle. Clock cycles
required to execute an instruction do vary from 2 cycles 12 cycles
varying from CPU to CPU.
 

See the description of an old Pentium MMX processor (the sucessor of i586, 
running up to 200 MHz):
http://www.intel.com/design/archives/processors/mmx/docs/243185.htm

Thanks to superscalar architecture, it runs 2 instructions per clock cycle 
(IPC).

Newer processors can run more instructions per clock cycle. For example, any 
Pentium-4 processor,
can do 3 IPC:
http://www.intel.com/support/processors/pentium4/sb/CS-017371.htm
   


I don't think you can just take the average IPC rates into account for 
this. When doing a syscall the processors TLB cache will be cleared, 
which will force the CPU to go to the whole instruction pipeline before 
the first syscall instruction is actually executed. This will introduce 
a delay for each syscall you make. I'm not exactly sure about the length 
of the delay, but I think it should be something like 2 clock cycles.

So, even on such bad hardware that is at least 20x slower than a netbook 
running at 1Gbps,
what determines the delay is the amount of I/O you're doing, and not the number 
of extra
code.
   


The I/O overhead required to read 4 registers from hardware is the
same whether you use the ioctl approach or s2api.
 

It seems you got my point. What will determinate the delay is the number of 
I/O's, and not the
amount of instructions.
   
The number of hardware I/Os is constant for both cases, so we do not 
need to discuss them as pro/con for any of the proposals.

Eventually, as you have pointed out yourself, The data struct will be
in the cache all the time for the ioctl approach. The only new
addition to the existing API in the ioctl case is a CALL instruction
as compared to the numerous instructions in comparison to that you
have pointed out as with the s2api approach.
 

True, but, as shown, the additional delay introduced by the code is less than 
0.01%, even on
a processor that has half of the speed of a 12-year old very slow CPU (a 
Pentium MMX @ 100 MHz
is capable of 2 IPC. My calculus assumed 1 IPC).

So, what will affect the delay is the number of I/O you need to do.

To get all data that the ioctl approach struct has, the delay for S2API will be 
equal.
To get less data, S2API will have a small delay.
   
Imho the S2API would be slower when reading all data the ioctl fetches, 
due to the way the instructions would be handled.


Correct me, if I'm wrong with any of this.

Cheers,
Julian

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New DVB-Statistics API

2009-12-09 Thread Mauro Carvalho Chehab
Julian Scheel wrote:
 Am 09.12.09 14:02, schrieb Mauro Carvalho Chehab:
 Manu Abraham wrote:

 I don't think you can just take the average IPC rates into account for
 this. When doing a syscall the processors TLB cache will be cleared,
 which will force the CPU to go to the whole instruction pipeline before
 the first syscall instruction is actually executed. This will introduce
 a delay for each syscall you make. I'm not exactly sure about the length
 of the delay, but I think it should be something like 2 clock cycles.

True, but this delay is common to both S2API and struct.

 To get all data that the ioctl approach struct has, the delay for
 S2API will be equal.
 To get less data, S2API will have a small delay.

 Imho the S2API would be slower when reading all data the ioctl fetches,
 due to the way the instructions would be handled.
 
 Correct me, if I'm wrong with any of this.

Not sure if I understood your question.

On both cases, just one function call will go to the driver, with one struct
(struct fe, the case of S2API) or two structs (struct fe and the
stats-specific struct(s)) in the case of a new ioctl(s).

As drivers are free to implement any logic, the driver can implement exactly
the same logic with both API calls. So, the delay to get the info will be
equal on both cases.

In a practical case, this will take at least a few milisseconds to retrieve all
data. It may take even more, since, on some drivers, you may need to wait for
some condition to happen before start measuring, in order to be sure that you'll
be getting atomic and accurate values.

After the function return, with an struct, you can just return, while, with 
S2API,
you'll need to put the data into the proper payload fields, but this will add
a delay in the order of nanoseconds.

Let's say that, to get all data, the routine needs 10 milisseconds.

The difference between new ioctl or S2API will be of about 0,00063 milliseconds
on a Pentium MMX. On a machine with 1GHz of clock, 2 IPC, the difference will
be 0,315 milliseconds.

Considering that the Linux kernel is preemptive, and an interrupt or the 
scheduler
could be called during the 10 milliseconds time, I doubt you'll be able to
notice that difference on any practical use case.

On the other hand, if you need for example just the strength of the signal at 
the AGC,
if you call via struct, you'll still be consuming the same 10 ms, while, with 
S2API,
you can do it, let's say, on 1 ms (the real numbers will depend, of course, on 
how
much I/O is needed on hardware, and on how many time do you need to wait there 
to 
get an stable value).

So, if you want to do things like moving a rotor, S2API will give better 
results.
If you want all stats, it will give the same result as a new ioctl.

While I don't think that 0,315 milliseconds is worth enough to cause any 
troubles,
With a simple change like the one bellow, this time can be reduced to 0,105 
milisseconds
with a patch like that (the patch were simplified to change just quality, but, 
of course,
such approach needs to be done on the other fields to get this result):


Index: master/linux/drivers/media/dvb/dvb-core/dvb_frontend.c
===
--- master.orig/linux/drivers/media/dvb/dvb-core/dvb_frontend.c
+++ master/linux/drivers/media/dvb/dvb-core/dvb_frontend.c
@@ -1220,6 +1220,7 @@ static int dtv_property_prepare_get_stat
switch (tvp-cmd) {
case DTV_FE_QUALITY:
fe-dtv_property_cache.need_stats |= FE_NEED_QUALITY;
+   fe-dtv_property_cache.quality = tvp-u.data;
break;
case DTV_FE_QUALITY_UNIT:
fe-dtv_property_cache.need_stats |= FE_NEED_QUALITY_UNIT;
@@ -1384,9 +1385,6 @@ static int dtv_property_process_get(stru
break;

/* Quality measures */
-   case DTV_FE_QUALITY:
-   tvp-u.data = fe-dtv_property_cache.quality;
-   break;
case DTV_FE_QUALITY_UNIT:
tvp-u.data = fe-dtv_property_cache.quality_unit;
break;
@@ -1696,10 +1697,12 @@ static int dvb_frontend_ioctl_properties
}
}
 
-   for (i = 0; i  tvps-num; i++) {
-   (tvp + i)-result = dtv_property_process_get(fe,
-   tvp + i, inode, file, need_get_ops);
-   err |= (tvp + i)-result;
+   if (need_get_ops) {
+   for (i = 0; i  tvps-num; i++) {
+   (tvp + i)-result = dtv_property_process_get(fe,
+   tvp + i, inode, file, 
need_get_ops);
+   err |= (tvp + i)-result;
+   }
}
 
if (copy_to_user(tvps-props, tvp, tvps-num * sizeof(struct 
dtv_property))) {
Index: master/linux/drivers/media/dvb/dvb-core/dvb_frontend.h

Re: New DVB(!!)-Statistics API

2009-12-08 Thread Julian Scheel

Am 08.12.09 10:16, schrieb Julian Scheel:

Hello together,

after the last thread which asked about signal statistics details 
degenerated into a discussion about the technical possibilites for 
implementing an entirely new API, which lead to nothing so far, I 
wanted to open a new thread to bring this forward. Maybe some more 
people can give their votes for the different options


Actually Manu did propose a new API for fetching enhanced statistics. 
It uses new IOCTL to directly fetch the statistical data in one go 
from the frontend. This propose was so far rejected by Mauro who wants 
to use the S2API get/set calls instead.


Now to give everyone a quick overview about the advantages and 
disadvantages of both approaches I will try to summarize it up:


1st approach: Introduce new IOCTL

Pros:
- Allows a quick fetch of the entire dataset, which ensures that:
 1. all values are fetched in one go (as long as possible) from the 
frontend, so that they can be treated as one united and be valued in 
conjunction
 2. the requested values arrive the caller in an almost constant 
timeframe, as the ioctl is directly executed by the driver
- It does not interfere with the existing statistics API, which has to 
be kept alive as it is in use for a long time already. (unifying it's 
data is not the approach of this new API)


Cons:
- Forces the application developers to interact with two APIs. The 
S2API for tuning on the one hand and the Statistics API for reading 
signal values on the other hand.


2nd approach: Set up S2API calls for reading statistics

Pros:
- Continous unification of the linuxtv API, allowing all calls to be 
made through one API. - easy for application developers


Cons:
- Due to the key/value pairs used for S2API getters the statistical 
values can't be read as a unique block, so we loose the guarantee, 
that all of the values can be treatend as one unit expressing the 
signals state at a concrete time.
- Due to the general architecture of the S2API the delay between 
requesting and receiving the actual data could become too big to allow 
realtime interpretation of the data (as it is needed for applications 
like satellite finders, etc.)


I hope that this summary is technically correct in all points, if not 
I'd be thankful for objective corrections. I am not going to 
articulate my personal opinion in this mail, but I will do it in 
another mail in reply to this one, so that this mail can be seen as a 
neutral summary of the current situation.


So now it's everyones turn to think about the options and give an 
opinion. In the end I am quite sure that all of us would have great 
benefits of an improved statistics API.


Cheers,
Julian
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


The above mail of course has nothing to do with USB-Statistics. I must 
have been slightly confused when typing the message title! Sorry for that!

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New DVB-Statistics API

2009-12-08 Thread Mauro Carvalho Chehab
Hi Julian,

Let me add some corrections to your technical analysis.

Julian Scheel wrote:
 Hello together,
 
 after the last thread which asked about signal statistics details
 degenerated into a discussion about the technical possibilites for
 implementing an entirely new API, which lead to nothing so far, I wanted
 to open a new thread to bring this forward. Maybe some more people can
 give their votes for the different options
 
 Actually Manu did propose a new API for fetching enhanced statistics. It
 uses new IOCTL to directly fetch the statistical data in one go from the
 frontend. This propose was so far rejected by Mauro who wants to use the
 S2API get/set calls instead.
 
 Now to give everyone a quick overview about the advantages and
 disadvantages of both approaches I will try to summarize it up:
 
 1st approach: Introduce new IOCTL
 
 Pros:
 - Allows a quick fetch of the entire dataset, which ensures that:
  1. all values are fetched in one go (as long as possible) from the
 frontend, so that they can be treated as one united and be valued in
 conjunction
  2. the requested values arrive the caller in an almost constant
 timeframe, as the ioctl is directly executed by the driver
 - It does not interfere with the existing statistics API, which has to
 be kept alive as it is in use for a long time already. (unifying it's
 data is not the approach of this new API)
 
 Cons:
 - Forces the application developers to interact with two APIs. The S2API
 for tuning on the one hand and the Statistics API for reading signal
 values on the other hand.
 
 2nd approach: Set up S2API calls for reading statistics
 
 Pros:
 - Continous unification of the linuxtv API, allowing all calls to be
 made through one API. - easy for application developers

- Scaling values can be retrieved/negotiated (if we implement the set
mode) before requesting the stats, using the same API;

- API can be easily extended to support other statistics that may be needed
by newer DTV standards.

 
 Cons:
 - Due to the key/value pairs used for S2API getters the statistical
 values can't be read as a unique block, so we loose the guarantee, that
 all of the values can be treatend as one unit expressing the signals
 state at a concrete time.

You missed the point here. The proposal patch groups all S2API
pairs into a _single_ call into the driver:

 + for (i = 0; i  tvps-num; i++)
 + need_get_ops += dtv_property_prepare_get_stats(fe,
 +  tvp + i, inode, file);
 +
 + if (!fe-dtv_property_cache.need_stats) {
 + need_get_ops++;
 + } else {
 + if (fe-ops.get_stats) {
 + err = fe-ops.get_stats(fe);
 + if (err  0)
 + return err;
 + }
 + }

The dtv_property_prepare_get_stats will generate a bitmap field (need_stats) 
that
will describe all value pairs that userspace is expecting. After doing it,
a single call is done to get_stats() callback.

All the driver need to do is to fill all values at dtv_property_cache. If the 
driver
fills more values than requested by the user, the extra values will simply be 
discarded.

In order to reduce latency, the driver may opt to not read the register values 
for the
data that aren't requested by the user, like I did on cx24123 driver.

Those values will all be returned at the same time to userspace by a single 
copy_to_user()
operation.

 - Due to the general architecture of the S2API the delay between
 requesting and receiving the actual data could become too big to allow
 realtime interpretation of the data (as it is needed for applications
 like satellite finders, etc.)

Not true. As pointed at the previous answer, the difference between a new ioctl
and S2API is basically the code at dtv_property_prepare_get_stats() and
dtv_property_process_get(). This is a pure code that uses a continuous struct
that will likely be at L3 cache, inside the CPU chip. So, this code will run
really quickly.

As current CPU's runs at the order of Teraflops (as the CPU clocks are at 
gigahertz
order, and CPU's can handle multiple instructions per clock cycle), the added 
delay
is in de order of nanosseconds. 

On the other hand, a simple read of a value from an i2c device is in the order
of milisseconds, since I2C serial bus, speed is slow (typically operating at
100 Kbps).

So, the delay is determined by the number of I2C calls you have at the code.

With the new ioctl proposal, as you need to read all data from I2C (even the 
ones
that userspace don't need), you'll have two situations:
- if you use S2API call to request all data provided by ioctl approach,
the delay will be the same;
- if you use S2API call to request less stats, the S2API delay will
be shorter. 

For example, with cx24123, the S2API delay it will be 6 times shorter than the

Re: New DVB-Statistics API

2009-12-08 Thread Manu Abraham
On Tue, Dec 8, 2009 at 5:22 PM, Mauro Carvalho Chehab
mche...@redhat.com wrote:
 Hi Julian,

 Let me add some corrections to your technical analysis.

 Julian Scheel wrote:
 Hello together,

 after the last thread which asked about signal statistics details
 degenerated into a discussion about the technical possibilites for
 implementing an entirely new API, which lead to nothing so far, I wanted
 to open a new thread to bring this forward. Maybe some more people can
 give their votes for the different options

 Actually Manu did propose a new API for fetching enhanced statistics. It
 uses new IOCTL to directly fetch the statistical data in one go from the
 frontend. This propose was so far rejected by Mauro who wants to use the
 S2API get/set calls instead.

 Now to give everyone a quick overview about the advantages and
 disadvantages of both approaches I will try to summarize it up:

 1st approach: Introduce new IOCTL

 Pros:
 - Allows a quick fetch of the entire dataset, which ensures that:
  1. all values are fetched in one go (as long as possible) from the
 frontend, so that they can be treated as one united and be valued in
 conjunction
  2. the requested values arrive the caller in an almost constant
 timeframe, as the ioctl is directly executed by the driver
 - It does not interfere with the existing statistics API, which has to
 be kept alive as it is in use for a long time already. (unifying it's
 data is not the approach of this new API)

 Cons:
 - Forces the application developers to interact with two APIs. The S2API
 for tuning on the one hand and the Statistics API for reading signal
 values on the other hand.

 2nd approach: Set up S2API calls for reading statistics

 Pros:
 - Continous unification of the linuxtv API, allowing all calls to be
 made through one API. - easy for application developers

 - Scaling values can be retrieved/negotiated (if we implement the set
 mode) before requesting the stats, using the same API;

 - API can be easily extended to support other statistics that may be needed
 by newer DTV standards.


 Cons:
 - Due to the key/value pairs used for S2API getters the statistical
 values can't be read as a unique block, so we loose the guarantee, that
 all of the values can be treatend as one unit expressing the signals
 state at a concrete time.

 You missed the point here. The proposal patch groups all S2API
 pairs into a _single_ call into the driver:

 + for (i = 0; i  tvps-num; i++)
 + need_get_ops += dtv_property_prepare_get_stats(fe,
 +  tvp + i, inode, file);
 +
 + if (!fe-dtv_property_cache.need_stats) {
 + need_get_ops++;
 + } else {
 + if (fe-ops.get_stats) {
 + err = fe-ops.get_stats(fe);
 + if (err  0)
 + return err;
 + }
 + }

 The dtv_property_prepare_get_stats will generate a bitmap field (need_stats) 
 that
 will describe all value pairs that userspace is expecting. After doing it,
 a single call is done to get_stats() callback.

 All the driver need to do is to fill all values at dtv_property_cache. If the 
 driver
 fills more values than requested by the user, the extra values will simply be 
 discarded.

 In order to reduce latency, the driver may opt to not read the register 
 values for the
 data that aren't requested by the user, like I did on cx24123 driver.

 Those values will all be returned at the same time to userspace by a single 
 copy_to_user()
 operation.

 - Due to the general architecture of the S2API the delay between
 requesting and receiving the actual data could become too big to allow
 realtime interpretation of the data (as it is needed for applications
 like satellite finders, etc.)

 Not true. As pointed at the previous answer, the difference between a new 
 ioctl
 and S2API is basically the code at dtv_property_prepare_get_stats() and
 dtv_property_process_get(). This is a pure code that uses a continuous struct
 that will likely be at L3 cache, inside the CPU chip. So, this code will run
 really quickly.



AFAIK, cache eviction occurs with syscalls: where content in the
caches near the CPU cores is pushed to the outer cores, resulting in
cache misses. Not all CPU's are equipped with L3 caches. Continuous
syscalls can result in TLB cache misses from what i understand, which
is expensive.


These are the numbers Intel lists for a Pentium M:

To  Cycles
Register= 1
L1d ~3
L2  ~14
Main Memory ~240



 As current CPU's runs at the order of Teraflops (as the CPU clocks are at 
 gigahertz
 order, and CPU's can handle multiple instructions per clock cycle), the added 
 delay
 is in de order of nanosseconds.


Consider STB's where DVB is generally deployed rather than the small
segment of home users running a stack on 

Re: New DVB-Statistics API

2009-12-08 Thread Mauro Carvalho Chehab
Manu Abraham wrote:

 Not true. As pointed at the previous answer, the difference between a new 
 ioctl
 and S2API is basically the code at dtv_property_prepare_get_stats() and
 dtv_property_process_get(). This is a pure code that uses a continuous struct
 that will likely be at L3 cache, inside the CPU chip. So, this code will run
 really quickly.
 
 
 
 AFAIK, cache eviction occurs with syscalls: where content in the
 caches near the CPU cores is pushed to the outer cores, resulting in
 cache misses. Not all CPU's are equipped with L3 caches. Continuous
 syscalls can result in TLB cache misses from what i understand, which
 is expensive.

It is very likely that the contents of struct fe to go into the cache during the
syscall. I was conservative when I talked about L3. Depending on the cache 
sizes,
I won't doubt that the needed fields of the fe struct will go to L1 cache.

 As current CPU's runs at the order of Teraflops (as the CPU clocks are at 
 gigahertz
 order, and CPU's can handle multiple instructions per clock cycle), the 
 added delay
 is in de order of nanosseconds.
 
 
 Consider STB's where DVB is generally deployed rather than the small
 segment of home users running a stack on a generic PC.

Even with STB, let's assume a very slow cpu that runs at 100 Megabytes/second. 
So, the clock
speed is 10 nanoseconds. Assuming that this CPU doesn't have a good pipeline, 
being
capable of handling only one instruction per second, you'll have one 
instruction at executed
at each 10 nanoseconds (as a reference, a Pentium 1, running at 133 Mbps is 
faster than this).

An I/O operation at i2c is of the order of 10^-3. Assuming that an additional 
delay of 10%
(10 ^ -4) will deadly affect realtime capability (with it is very doubtful), 
this means that
the CPU can run up to 10,000 (!!!) instructions to generate such delay. If you 
compile that code
and check the number or extra instructions I bet it will be shorter enough to 
not cause any
practical effect.

So, even on such bad hardware that is at least 20x slower than a netbook 
running at 1Gbps,
what determines the delay is the amount of I/O you're doing, and not the number 
of extra
code.

  Hardware I/O is the most expensive operation involved. 

True. That's what I said.

 Case #1: the ioctl approach
code stripped

 Now Case #2: based on s2api
code stripped

 Now that we can see the actual code flow, we can see the s2api
 approach requires an unnecessary large tokenizer/serializer, involving
 multiple function calls.

Are you seeing there 10.000 assembler instructions or so? If not, the size of 
the code is
not relevant. 

Also: gcc optimizer transforms switches into a binary tree. So, if you have 64
cases on switch, it will require 7 comparations (log2(64)) to get a match.

For example, a quick look at the code you've presented, let's just calculate
the number of operations on the dtv_property_proccess_get() routine, without
debug stuff:

static int dtv_property_process_get() {
CMP (if fe-ops.get_property)
CMP (if r  0)    This if only happens if the 
first one is executed. On my patch, it is not executed
(the code you posted is the one 
before my patch)
SWITCH (7 CMP's)  due to binary tree optimization 
done by gcc
MOV
}

So, that entire code (that has about 200 lines) has, in fact
9 comparations and one move instruction.

At dtv_property_prepare_get_stats(), the code is even cheaper: just a switch 
with 8
elements (log2(8) = 3), so 3 comparations, and one move instruction.

The additional cost on dvb_frontend_ioctl_properties is:
2 MOVs
One loop calling dtv_property_prepare_get_stats() - up to 4 times to 
retrieve
all quality values
one INC
one CMP and function CALL (the same cost exists also with the struct)
One loop calling dtv_property_get_stats() - up to 4 times to retrieve
all quality values

So, if I've calculated it right, we're talking about 2+1+16+1+2+1+40 = 63 
instructions.

So, the real executed code is very cheap. On that very slow CPU, the delay will 
be
about 63 x 10 nanosseconds = 630 nanosseconds of delay (being 220ns before the
call and 410ns after it).

This means 0.063% of a delay at the order of milisseconds required just one
I2C read (and, in general, you need more than one single read to get the stats).

Even if the code would double the size, the latency won't be affected.

 Aspect #1. Comparing Case #1 and Case #2, i guess anyone will agree
 that case 2 has very much code overhead in terms of the serialization
 and serialization set/unset.

It has some overhead, but if you are concerned about the realtime case, 
just run kernel perf tools. You'll see my point. If you have less I/O
to do (to retrieve just the data you need), Case #2 is much faster than Case #1.

 Aspect #2, Comparing the data payload between Case #1 and Case #2, i
 guess anyone can