Hi there, I'm using *GBTClassifier* do some classification jobs and find the performance of scoring stage is not quite satisfying. The trained model has about 160 trees and the input feature vector is sparse and its size is about 20+.
After some digging, I found the model will repeatedly and randomly access feature in SparseVector when predicting an input vector, which will eventually call function *breeze.linalg.SparseVector#apply.* That function generally uses a binary search to locate the corresponding index so the complexity is O(log numNonZero). Then I tried to convert my feature vectors to dense vectors before inference and the result shows that the inference stage can speed up for about 2~3 times. (Random access in DenseVector is O(1)) So my question is why not use *breeze.linalg.HashVector* when randomly accessing values in SpareVector since the complexity is O(1) according to Breeze's documentation, much better than the SparseVector in such case. Thanks, Vincent