I'm assuming that you are writing the cosine similarity and you have two vectors containing the pairs <term, tfidf>. The two vectors could have different sizes because they only contain the terms that have tfidf != 0. if you want to compute cosine similarity between the two lists you just have to consider the pairs that appears in **both the vectors**, because otherwise if a term doesn't appear in one of the two the product is going to be 0, so it will not contribute to the final tfidf score.
(Really old) Example: https://github.com/diegoceccarelli/dexter/blob/fb4bbcb27a13da2665f3c19d6c75bfc4f5778440/dexter-core/src/main/java/it/cnr/isti/hpc/dexter/lucene/LuceneHelper.java#L386 From: solr-user@lucene.apache.org At: 01/06/18 17:24:07To: solr-user@lucene.apache.org Subject: Re: Personalized search parameters Don't we need vectors of the same size to calculate the cosine similarity? Maybe I missed something, but following that example it looks like i have to manually recreate the sparse vectors, because the term vector of a document should (i may be wrong) contain only the terms that appear in that document. Am I wrong? Given that i assumed (and that example goes in that direction) that we have to manually create the sparse vector by first collecting all the terms and then calculating the tf-idf frequency for each term in each document. That's what i did, and I obtained vectors of the same dimension for each document, i was just wondering if there was a better optimized way to obtain those sparse vectors. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html