[
https://issues.apache.org/jira/browse/YARN-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758956#comment-16758956
]
Peter Bacsko commented on YARN-9265:
------------------------------------
Thanks [~tangzhankun], the new device framework looks promising.
But still, we have to problems with this:
1) When this will be released? Any ETA? We have to deliver FPGA support to our
customers and most likely we won't have enough time to wait for it.
2) Somehow the cards still need to be autodetected.
As an interim solution I suggest the pluggable parser or the manual setup which
can be set by a YARN property. Later on we can make it deprecated.
> FPGA plugin fails to recognize Intel Processing Accelerator Card
> ----------------------------------------------------------------
>
> Key: YARN-9265
> URL: https://issues.apache.org/jira/browse/YARN-9265
> Project: Hadoop YARN
> Issue Type: Sub-task
> Affects Versions: 3.1.0
> Reporter: Peter Bacsko
> Priority: Critical
>
> The plugin cannot autodetect Intel FPGA PAC (Processing Accelerator Card).
> There are two major issues.
> Problem #1
> The output of aocl diagnose:
> {noformat}
> --------------------------------------------------------------------
> Device Name:
> acl0
>
> Package Pat:
> /home/pbacsko/inteldevstack/intelFPGA_pro/hld/board/opencl_bsp
>
> Vendor: Intel Corp
>
> Physical Dev Name Status Information
>
> pac_a10_f200000 Passed PAC Arria 10 Platform (pac_a10_f200000)
> PCIe 08:00.0
> FPGA temperature = 79 degrees C.
>
> DIAGNOSTIC_PASSED
> --------------------------------------------------------------------
>
> Call "aocl diagnose <device-names>" to run diagnose for specified devices
> Call "aocl diagnose all" to run diagnose for all devices
> {noformat}
> The plugin fails to recognize this and fails with the following message:
> {noformat}
> 2019-01-25 06:46:02,834 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaResourcePlugin:
> Using FPGA vendor plugin:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin
> 2019-01-25 06:46:02,943 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaDiscoverer:
> Trying to diagnose FPGA information ...
> 2019-01-25 06:46:03,085 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule:
> Using traffic control bandwidth handler
> 2019-01-25 06:46:03,108 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
> Initializing mounted controller cpu at /sys/fs/cgroup/cpu,cpuacct/yarn
> 2019-01-25 06:46:03,139 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceHandlerImpl:
> FPGA Plugin bootstrap success.
> 2019-01-25 06:46:03,247 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
> Couldn't find (?i)bus:slot.func\s=\s.*, pattern
> 2019-01-25 06:46:03,248 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
> Couldn't find (?i)Total\sCard\sPower\sUsage\s=\s.* pattern
> 2019-01-25 06:46:03,251 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
> Failed to get major-minor number from reading /dev/pac_a10_f300000
> 2019-01-25 06:46:03,252 ERROR
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to
> bootstrap configured resource subsystems!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
> No FPGA devices detected!
> {noformat}
> Problem #2
> The plugin assumes that the file name under {{/dev}} can be derived from the
> "Physical Dev Name", but this is wrong. For example, it thinks that the
> device file is {{/dev/pac_a10_f300000}} which is not the case, the actual
> file is {{/dev/intel-fpga-port.0}}.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]