HyukjinKwon commented on code in PR #48347:
URL: https://github.com/apache/spark/pull/48347#discussion_r1830059521


##########
docs/ml-features.md:
##########
@@ -855,6 +855,116 @@ for more details on the API.
 
 </div>
 
+## TargetEncoder
+
+[Target 
Encoding](https://www.researchgate.net/publication/220520258_A_Preprocessing_Scheme_for_High-Cardinality_Categorical_Attributes_in_Classification_and_Prediction_Problems)
 is a data-preprocessing technique that transforms high-cardinality categorical 
features into quasi-continuous scalar attributes suited for use in 
regression-type models. This paradigm maps individual values of an independent 
feature to a scalar, representing some estimate of the dependent attribute 
(meaning categorical values that exhibit similar statistics with respect to the 
target will have a similar representation).

Review Comment:
   ```suggestion
   [Target 
Encoding](https://www.researchgate.net/publication/220520258_A_Preprocessing_Scheme_for_High-Cardinality_Categorical_Attributes_in_Classification_and_Prediction_Problems)
 is a data-preprocessing technique that transforms high-cardinality categorical 
features into quasi-continuous scalar attributes suited for use in 
regression-type models. This paradigm maps individual values of an independent 
feature to a scalar, representing some estimate of the dependent attribute 
(meaning categorical values that exhibit similar statistics with respect to the 
target will have a similar representation).
   ```



##########
docs/ml-features.md:
##########
@@ -855,6 +855,116 @@ for more details on the API.
 
 </div>
 
+## TargetEncoder
+
+[Target 
Encoding](https://www.researchgate.net/publication/220520258_A_Preprocessing_Scheme_for_High-Cardinality_Categorical_Attributes_in_Classification_and_Prediction_Problems)
 is a data-preprocessing technique that transforms high-cardinality categorical 
features into quasi-continuous scalar attributes suited for use in 
regression-type models. This paradigm maps individual values of an independent 
feature to a scalar, representing some estimate of the dependent attribute 
(meaning categorical values that exhibit similar statistics with respect to the 
target will have a similar representation).

Review Comment:
   ```suggestion
   [Target 
Encoding](https://www.researchgate.net/publication/220520258_A_Preprocessing_Scheme_for_High-Cardinality_Categorical_Attributes_in_Classification_and_Prediction_Problems)
 is a data-preprocessing technique that transforms high-cardinality categorical 
features into quasi-continuous scalar attributes suited for use in 
regression-type models. This paradigm maps individual values of an independent 
feature to a scalar, representing some estimate of the dependent attribute 
(meaning categorical values that exhibit similar statistics with respect to the 
target will have a similar representation).
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to