Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/245#issuecomment-39141551
  
    @yinxusen Yes, feature transformation should be done before learning 
algorithms. This gives a better separation. It also allows us to plug in more 
powerful tools for feature transformation in the future. I'm thinking about 
PMML at this time but there might be other options. User should decide whether 
to cache the data before transformation or after. Sometimes it is expensive to 
cache the one after because of densification or explosion of feature space. But 
IMHO this shouldn't be handled by learning algorithms. Ideally, feature 
transformation includes adding intercept. But since it is used very common, I 
leave the option there but set default to false. Prepending intercept needs 
re-allocation of vectors. You can see the different easily.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to