I am sorry, I used VecDuplictaeVecs not MatDuplicateVecs Eda Oktay <[email protected]>, 22 May 2020 Cum, 20:31 tarihinde şunu yazdı:
> Dear Richard, > > Thank you for your email. From MATLAB's kmeans() function I believe I got > the final clustering index set, not centroids. What I am trying to do is to > cluster vectors created by MatDuplicateVecs() according to the index set > (whose type is not IS since I took it from MATLAB) that I obtained from > MATLAB. I am trying to cluster these vectors however since they are > parallel, I couldn't understand how to cluster them. > > Normally, I have to be independent from MATLAB so I will try your > suggestion, grateful for that. However, because of my limited > knowledge about PETSc and parallel computing, I am not able to figure out > how to cluster parallel vectors according to an index set. > > Thanks, > > Eda > > Mills, Richard Tran <[email protected]>, 30 Nis 2020 Per, 02:07 tarihinde > şunu yazdı: > >> Hi Eda, >> >> Thanks for your reply. I'm still trying to understand why you say you >> need to duplicate the row vectors across all processes. When I have >> implemented parallel k-means, I don't duplicate the row vectors. (This >> would be very unscalable and largely defeat the point of doing this with >> MPI parallelism in the first place.) >> >> Earlier in this email thread, you said that you have used Matlab to get >> cluster IDs for each row vector. Are you trying to then use this >> information to calculate the cluster centroids from inside your PETSc >> program? If so, you can do this by having each MPI rank do the following: >> For cluster i in 0 to (k-1), calculate the element-wise sum of all of the >> local rows that belong to cluster i, then use MPI_Allreduce() to calculate >> the global elementwise sum of all the local sums (this array will be >> replicated across all MPI ranks), and finally divide by the number of >> members of that cluster to get the centroid. Note that MPI_Allreduce() >> doesn't work on PETSc objects, but simple arrays, so you'll want to use >> something like MatGetValues() or MatGetRow() to access the elements of your >> row vectors. >> >> Let me know if I am misunderstanding what you are aiming to do, or if I >> am misunderstanding something. >> >> It sounds like you would benefit from having some routines in PETSc to do >> k-means (or other) clustering, by the way? >> >> Best regards, >> Richard >> >> On 4/29/20 3:47 AM, Eda Oktay wrote: >> >> Dear Richard, >> >> I am trying to use spectral clustering algorithm by using k-means >> clustering algorithm at some point. I am doing this by producing a matrix >> consisting of eigenvectors (of the adjacency matrix of the graph that I >> want to partition), then forming row vectors of this matrix. This is the >> part that I am using parallel vector. By using the output from k-means, I >> am trying to cluster these row vectors. To cluster these vectors, I think I >> need all row vectors in all processes. I wanted to use sequential vectors, >> however, I couldn't find a different way that I form row vectors of a >> matrix. >> >> I am trying to use VecScatterCreateToAll, however, since my vector is >> parallel crated by VecDuplicateVecs, my input is not in correct type, so I >> get error. I still can't get how can I use this function in parallel vector >> created by VecDuplicateVecs. >> >> Thank you all for your help. >> >> Eda >> >> Mills, Richard Tran <[email protected]>, 7 Nis 2020 Sal, 01:51 tarihinde >> şunu yazdı: >> >>> Hi Eda, >>> >>> I think that you probably want to use VecScatter routines, as Junchao >>> has suggested, instead of the lower level star forest for this. I >>> believe that VecScatterCreateToZero() is what you want for the broadcast >>> problem you describe, in the second part of your question. I'm not sure >>> what you are trying to do in the first part. Taking a parallel vector >>> and then copying its entire contents to a sequential vector residing on >>> each process is not scalable, and a lot of the design that has gone into >>> PETSc is to prevent the user from ever needing to do things like that. >>> Can you please tell us what you intend to do with these sequential >>> vectors? >>> >>> I'm also wondering why, later in your message, you say that you get >>> cluster assignments from Matlab, and then "to cluster row vectors >>> according to this information, all processors need to have all of the >>> row vectors". Do you mean you want to get all of the row vectors copied >>> onto all of the processors so that you can compute the cluster >>> centroids? If so, computing the cluster centroids can be done without >>> copying the row vectors onto all processors if you use a communication >>> operation like MPI_Allreduce(). >>> >>> Lastly, let me add that I've done a fair amount of work implementing >>> clustering algorithms on distributed memory parallel machines, but >>> outside of PETSc. I was thinking that I should implement some of these >>> routines using PETSc. I can't get to this immediately, but I'm wondering >>> if you might care to tell me a bit more about the clustering problems >>> you need to solve and how having some support for this in PETSc might >>> (or might not) help. >>> >>> Best regards, >>> Richard >>> >>> On 4/4/20 1:39 AM, Eda Oktay wrote: >>> > Hi all, >>> > >>> > I created a parallel vector UV, by using VecDuplicateVecs since I need >>> > row vectors of a matrix. However, I need the whole vector be in all >>> > processors, which means I need to gather all and broadcast them to all >>> > processors. To gather, I tried to use VecStrideGatherAll: >>> > >>> > Vec UVG; >>> > VecStrideGatherAll(UV,UVG,INSERT_VALUES); >>> > VecView(UVG,PETSC_VIEWER_STDOUT_WORLD); >>> > >>> > however when I try to view the vector, I get the following error. >>> > >>> > [3]PETSC ERROR: Invalid argument >>> > [3]PETSC ERROR: Wrong type of object: Parameter # 1 >>> > [3]PETSC ERROR: See >>> > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>> shooting. >>> > [3]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 >>> > [3]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a >>> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr 4 >>> > 11:22:54 2020 >>> > [3]PETSC ERROR: Wrong type of object: Parameter # 1 >>> > [0]PETSC ERROR: See >>> > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>> shooting. >>> > [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 >>> > [0]PETSC ERROR: ./clustering_son_final_edgecut_without_parmetis on a >>> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr 4 >>> > 11:22:54 2020 >>> > [0]PETSC ERROR: Configure options --download-mpich --download-openblas >>> > --download-slepc --download-metis --download-parmetis --download-chaco >>> > --with-X=1 >>> > [0]PETSC ERROR: #1 VecStrideGatherAll() line 646 in >>> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c >>> > ./clustering_son_final_edgecut_without_parmetis on a >>> > arch-linux2-c-debug named localhost.localdomain by edaoktay Sat Apr 4 >>> > 11:22:54 2020 >>> > [1]PETSC ERROR: Configure options --download-mpich --download-openblas >>> > --download-slepc --download-metis --download-parmetis --download-chaco >>> > --with-X=1 >>> > [1]PETSC ERROR: #1 VecStrideGatherAll() line 646 in >>> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c >>> > Configure options --download-mpich --download-openblas >>> > --download-slepc --download-metis --download-parmetis --download-chaco >>> > --with-X=1 >>> > [3]PETSC ERROR: #1 VecStrideGatherAll() line 646 in >>> > /home/edaoktay/petsc-3.11.1/src/vec/vec/utils/vinv.c >>> > >>> > I couldn't understand why I am getting this error. Is this because of >>> > UV being created by VecDuplicateVecs? How can I solve this problem? >>> > >>> > The other question is broadcasting. After gathering all elements of >>> > the vector UV, I need to broadcast them to all processors. I found >>> > PetscSFBcastBegin. However, I couldn't understand the PetscSF concept >>> > properly. I couldn't adjust my question to the star forest concept. >>> > >>> > My problem is: If I have 4 processors, I create a matrix whose columns >>> > are 4 smallest eigenvectors, say of size 72. Then by defining each row >>> > of this matrix as a vector, I cluster them by using k-means >>> > clustering algorithm. For now, I cluster them by using MATLAB and I >>> > obtain a vector showing which row vector is in which cluster. After >>> > getting this vector, to cluster row vectors according to this >>> > information, all processors need to have all of the row vectors. >>> > >>> > According to this problem, how can I use the star forest concept? >>> > >>> > I will be glad if you can help me about this problem since I don't >>> > have enough knowledge about graph theory. An if you have any idea >>> > about how can I use k-means algorithm in a more practical way, please >>> > let me know. >>> > >>> > Thanks! >>> > >>> > Eda >>> >> >>
