Hi, Doing cartesian multiplication against a matrix, I got the error: pyspark.sql.utils.IllegalArgumentException: requirement failed: Number of rows divided by rowsPerBlock cannot exceed maximum integer.
Here is the code: normalizer = Normalizer(inputCol="feature", outputCol="norm") data = normalizer.transform(tfidf) mat = IndexedRowMatrix( data.select("ID", "norm")\ .rdd.map(lambda row: IndexedRow(row.ID, row.norm.toArray()))).toBlockMatrix() dot = mat.multiply(mat.transpose()) dot.toLocalMatrix().toArray() The error point to the line: .rdd.map(lambda row: IndexedRow(row.ID, row.norm.toArray()))).toBlockMatrix() I reduce the data to only 5 sentences, but I still got the error!