Hi, I compiled the python packages with exactly the same configurations and I can't reproduce the issue old: CUDA 10.2, ITK 5.1.2, RTK 2.1.0 -> 0.9 s 0.019904613494873047 0.6475656032562256 Reconstructing... 0.9730124473571777
new: CUDA 11.5, ITK 5.2.1, RTK 2.3.0 0.017342329025268555 0.7650339603424072 Reconstructing... 0.8823671340942383 The code I ran is the following #!/usr/bin/env python import sys import itk import time from itk import RTK as rtk if len ( sys.argv ) < 3: print( "Usage: FirstReconstruction <outputimage> <outputgeometry>" ) sys.exit ( 1 ) # Defines the image type GPUImageType = rtk.CudaImage[itk.F,3] CPUImageType = rtk.Image[itk.F,3] # Defines the RTK geometry object geometry = rtk.ThreeDCircularProjectionGeometry.New() numberOfProjections = 200 firstAngle = 0. angularArc = 360. sid = 600 # source to isocenter distance sdd = 1200 # source to detector distance for x in range(0,numberOfProjections): angle = firstAngle + x * angularArc / numberOfProjections geometry.AddProjection(sid,sdd,angle) # Writing the geometry to disk xmlWriter = rtk.ThreeDCircularProjectionGeometryXMLFileWriter.New() xmlWriter.SetFilename ( sys.argv[2] ) xmlWriter.SetObject ( geometry ); xmlWriter.WriteFile(); # Create a stack of empty projection images ConstantImageSourceType = rtk.ConstantImageSource[GPUImageType] constantImageSource = ConstantImageSourceType.New() origin = [ -127.75, -127.75, 0. ] sizeOutput = [ 512, 512, numberOfProjections ] spacing = [ 0.5, 0.5, 0.5 ] constantImageSource.SetOrigin( origin ) constantImageSource.SetSpacing( spacing ) constantImageSource.SetSize( sizeOutput ) constantImageSource.SetConstant(0.) REIType = rtk.RayEllipsoidIntersectionImageFilter[CPUImageType, CPUImageType] rei = REIType.New() semiprincipalaxis = [ 50, 50, 50] center = [ 0, 0, 10] # Set GrayScale value, axes, center... rei.SetDensity(2) rei.SetAngle(0) rei.SetCenter(center) rei.SetAxis(semiprincipalaxis) rei.SetGeometry( geometry ) rei.SetInput(constantImageSource.GetOutput()) # Create reconstructed image constantImageSource2 = ConstantImageSourceType.New() sizeOutput = [ 256 ] * 3 origin = [ -63.75 ] * 3 spacing = [ 0.5 ] * 3 constantImageSource2.SetOrigin( origin ) constantImageSource2.SetSpacing( spacing ) constantImageSource2.SetSize( sizeOutput ) constantImageSource2.SetConstant(0.) t0 = time.time() constantImageSource2.Update() t1 = time.time() print(t1-t0) # Graft the projections to an itk::CudaImage projections = GPUImageType.New() t0 = time.time() rei.Update() t1 = time.time() print(t1-t0) projections.SetPixelContainer(rei.GetOutput().GetPixelContainer()) projections.CopyInformation(rei.GetOutput()) projections.SetBufferedRegion(rei.GetOutput().GetBufferedRegion()) projections.SetRequestedRegion(rei.GetOutput().GetRequestedRegion()) # FDK reconstruction print("Reconstructing...") FDKGPUType = rtk.CudaFDKConeBeamReconstructionFilter feldkamp = FDKGPUType.New() feldkamp.SetInput(0, constantImageSource2.GetOutput()) feldkamp.SetInput(1, projections) feldkamp.SetGeometry(geometry) feldkamp.GetRampFilter().SetTruncationCorrection(0.0) feldkamp.GetRampFilter().SetHannCutFrequency(0.0) t0 = time.time() feldkamp.Update() t1 = time.time() print(t1-t0) To be honest I don't see to do at this stage... Can you maybe check the same code with your two versions ? Any other suggestion? Simon On Wed, Nov 10, 2021 at 10:03 AM Moritz Schaar <sch...@imt.uni-luebeck.de> wrote: > Hi Simon, > > > > I completely agree that this is hard to track down. That’s why I am asking > for directions J > > To be more precise about the execution times of my example: > > The timings given in pairs of 17.1/1.2 s and 19/7 s are only the required > times of the reconstruction step itself. > > Reading data, pre and post processing are not part of this time > measurement. > > So the 7 s average in python is similar to the 6.41 s I obtained from > adding everything done in CudaFDKConeBeamReconstructionFilter using > RTK_PROBE_EACH_FILTER. > > The reconstruction step in python simply involves: > > - Instantiation of a simple class, this doesn’t add anything to > the timings > > - Setting up ConstantImageSource with either rtk.Image or > rtk.CudaImage > > - Setting up > FDKConeBeamReconstructionFilter/CudaFDKConeBeamReconstructionFilter > > - Setting inputs, geometry and filter > > - Update() and return result > > > > Looks like there was a typo in my mail, the versions compared should be: > > old: CUDA 10.2, ITK 5.1.2, RTK 2.1.0 > > new: CUDA 11.5, ITK 5.2.1, RTK 2.3.0 > > > > Sorry for the confusion and thanks for looking into it! > > > > Best, > > Moritz > > > > > > *Von:* Simon Rit <simon....@creatis.insa-lyon.fr> > *Gesendet:* Mittwoch, 10. November 2021 09:32 > *An:* Moritz Schaar <sch...@imt.uni-luebeck.de> > *Cc:* rtk-users@public.kitware.com > *Betreff:* Re: [Rtk-users] Slow CUDA FDK performance > > > > Hi Moritz, > > Thanks for the report. It's a bit hard to be convinced that something is > wrong without being able to reproduce it. From the RTK_PROBE_EACH_FILTER > log, most of the time is spent reading the projections which will be the > same with or without cuda so I wonder if this is not the issue here. I can > try to reproduce the issue, can you just confirm the two configurations : > Cuda 10.2, ITK 5.2.1, RTK 2.1.0 vs Cuda 11.5, ITK 5.2.1 RTK 2.3.0 ? > > Thanks, > > Simon > > > > On Fri, Nov 5, 2021 at 4:20 PM Moritz Schaar <sch...@imt.uni-luebeck.de> > wrote: > > Hi, > > > > I recently upgraded my Windows 10 system to ITK 5.2.1 including RTK 2.3.0. > > This also involved upgrading CUDA from 10.2 to 11.5, Visual Studio 2019 > and even python update (3.8.5 to 3.8.12). > > Using the python wrapping of RTK I implemented own routines that use FDK > similar to the rtkfdk application. > > On the old system (ITK 5.2.1, RTK 2.1.0) I benchmarked the FDK for a > 512x512x200 dataset reconstructed into 256x256x256 with 1.0 mm isotropic > voxel size. > > The system is equipped with 24 CPU cores and one RTX 2080 Ti, so the CPU > version took 17.1 and the CUDA version 1.2 seconds. > > Running the new software version on the same system results in roughly 19 > s CPU time but more than 7 s for the CUDA version. > > I don’t care about the actual timings but the relative increase of the > CUDA version is what bothers me. > > > > To dig up some more information I recompiled RTK with > RTK_PROBE_EACH_FILTER and ran rtkfdk.exe for the same data, this is what I > got: > > > ************************************************************************************************************** > > Probe Tag Starts Stops Time > (s) Memory (kB) Cuda memory (kB) > > > ************************************************************************************************************** > > ChangeInformationImageFilter 200 200 > 0.0211846 0 0 > > ConstantImageSource 1 1 > 0.0305991 65668 0 > > CudaCropImageFilter 13 13 > 0.0222911 15786.8 15753.8 > > CudaDisplacedDetectorImageFilter 13 13 > 0.0540568 10719.1 16384 > > CudaFDKBackProjectionImageFilter 13 13 > 0.0326397 5051.38 5041.23 > > CudaFDKConeBeamReconstructionFilter 1 1 > 5.72999 552184 211648 > > CudaFDKWeightProjectionFilter 13 13 > 0.0262806 -13892 630.154 > > CudaFFTRampImageFilter 13 13 > 0.148416 43095.4 12499.7 > > CudaParkerShortScanImageFilter 13 13 > 0.0467202 2525.85 15753.8 > > ExtractImageFilter 13 13 > 0.0259726 15812.3 -15753.8 > > ImageFileReader 200 200 > 0.0226735 -0.16 0 > > ImageSeriesReader 200 200 > 0.066097 6.12 0 > > ProjectionsReader 1 1 > 26.0388 208488 0 > > StreamingImageFilter 2 2 16.0663 > 547512 191840 > > VnlRealToHalfHermitianForwardFFTImageFilter 2 2 > 0.0208174 0 0 > > > > Following the conversion on the mailing list, > https://public.kitware.com/pipermail/rtk-users/2018-July/010617.html, I > see that the CudaFDKConeBeamReconstructionFilter takes 6.41 s of which > roughly 1/3 is spent in the CudaFFTRampImageFilter. > > Sadly I don’t have these results for the old software version so I can’t > relate these values. > > > > However, I also played around with v2.2.0 but it doesn’t make a difference. > > Sadly, the version I used before (v2.1.0) won’t compile with CUDA 11.5 > anymore. I tried to add small adjustments e.g. this commit > https://github.com/SimonRit/RTK/commit/3d3c7506087f5fa98aee75df5af5c30e7e51cbe6 > to make things work but this didn’t work. > > The same happens with other errors when trying to setup ITK 5.1.2, so > getting back the old version for comparison seems impossible. > > > > Is there any direction you can point me to check what is actually the > issue here? Or maybe someone has an idea what could be the reason? > CUDA/RTK/ITK > version? > > Any help is appreciated. > > > > *Best,* > > *Moritz* > > > > _______________________________________________ > Rtk-users mailing list > Rtk-users@public.kitware.com > https://public.kitware.com/mailman/listinfo/rtk-users > >
_______________________________________________ Rtk-users mailing list Rtk-users@public.kitware.com https://public.kitware.com/mailman/listinfo/rtk-users