Hello Karli, Here is a problem I am facing: I have an expression (X - y*c') .^2.rowwise().sum() (in Eigen notation) X is a row-major matrix with rows >> cols
I have initialized all data using the process given in custom_context.cpp. This is the code that pertains to viennacl. `viennacl::matrix<ScalarType, viennacl::row_major> vcl_X(bufPoints(), temp.rows(), temp.cols());` `viennacl::vector<ScalarType> vcl_Ones(bufOnes(), temp.rows());` `viennacl::vector<ScalarType> vcl_Ones2(testOnes(), cols);` `viennacl::vector<ScalarType> currCluster(testPoint(),cols);` `viennacl::vector<ScalarType> vcl_s1 = (viennacl::linalg::prod(viennacl::linalg::element_pow((vcl_X - viennacl::linalg::outer_prod(vcl_Ones, currCluster)),2.0),vcl_Ones2));` The time taken to execute this operation is : 1.54 (data size 1936*1216 rows and 3 columns) The time that I have shown is excluding the data offload time to GPU. Now, if I implement the same operation using Eigen on the CPU (without any optimization) the time reported in 0.04754! The results obtained by both processes are the same. So what could be wrong here? Am I missing out something here? Sumit
_______________________________________________ ViennaCL-support mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/viennacl-support
