[GitHub] [incubator-tvm-vta] remotego commented on pull request #8: [Hardware][Xilinx] explicitly specify acc dep distance to avoid hidden pitfall

2020-04-29 Thread GitBox


remotego commented on pull request #8:
URL: https://github.com/apache/incubator-tvm-vta/pull/8#issuecomment-621601302


   > Perhaps one way to look at this is to start with DISTANCE=3 by default. 
And if the II>1 for the GEMM, we issue a warning telling the user to increase 
the distance to 4. The process can repeat until the II of 1 is achieved, and 
the runtime will be informed of the actual distance so proper checks can be 
inserted.
   
   I love this idea.
   
   Our ultimate goal is to achieve ii = 1, thus it would be great if we could 
achieve it by some intelligent scripting... only problem is that we need to 
introduce a way to feedback the results to S/W side.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-tvm-vta] remotego commented on pull request #8: [Hardware][Xilinx] explicitly specify acc dep distance to avoid hidden pitfall

2020-04-29 Thread GitBox


remotego commented on pull request #8:
URL: https://github.com/apache/incubator-tvm-vta/pull/8#issuecomment-621593259


   > Thanks @zhanghaohit, for catching this error. I agree with the fact that 
the dependence should not be hardcoded. However in order to not to add too many 
parameters in the config file, can we derive the value from the VTA target 
(i.e. FPGA type) in `pkg_config`?
   > 
   > Also from what I understand is that the dependence distance is generally a 
property of the memory part that the FPGA is using. For instance that distance 
will be different if one uses BRAM vs. Ultra-RAM? And on a different FPGA 
family this number will change. Correct?
   
   Thanks @tmoreau89 for the advice. We also thought of putting the value into 
the pkg_config. However, later we found out the dependence distance is not 
tightly related to the device family/type.
   
   In my opinion, the compiler will decide the II based on multiple factors. 
But eventually, it is based on the number of cycles on the datapath from "Read 
Mem" -> "Perform Ops" -> "Write Back", as we need to avoid RAW data hazard.
   
   In the "Read Mem" and "Write Back" stage, it is definitely related to the 
properties of the memory (eg. device family, uram vs bram, etc). But it is also 
related to the H/W circuit of accessing the data. For example, multiplexers as 
there are multiple accesses. The compile will judge based on the complexity of 
the overall circuit, and it could add registers (increase II) if the desired 
frequency target could not be satisfied.
   
   In "Perform Ops" stage, the cycles needed is mainly related to the op 
itself. For example, a 32-bit multiplication may require more cycles than a 
8-bit multiplication.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org