Louiszr commented on a change in pull request #29445:
URL: https://github.com/apache/spark/pull/29445#discussion_r471731819
##########
File path: python/pyspark/ml/tuning.py
##########
@@ -536,7 +536,7 @@ def copy(self, extra=None):
bestModel = self.bestModel.copy(extra)
avgMetrics = self.avgMetrics
subModels = self.subModels
- return CrossValidatorModel(bestModel, avgMetrics, subModels)
+ return self._copyValues(CrossValidatorModel(bestModel, avgMetrics,
subModels), extra=extra)
Review comment:
We can. However I just found one potential issue with using
`Params.copy` (not specific to `CrossValidator`). It creates a shallow copy of
`self` (i.e. the models). Hence if we run the below snippet
```python
cvModelCopied = cvModel.copy()
cvModel.avgMetrics[0] = 'foo'
assert cvModelCopied.avgMetrics[0] != 'foo' # This will fail
```
Based on the Scala equivalent I think `avgMetrics` should be shallow copied
and `subModels` should be copied with the copying actions delegated to `copy()`
of each model. I will push a change to this function.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]