Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/15658#discussion_r85357265
--- Diff: docs/ml-features.md ---
@@ -729,6 +729,61 @@ for more details on the API.
</div>
</div>
+## Interaction
+
+`Interaction` is a `Transformer` which takes a list of vector/double
columns, and generate a single vector column
+that contains the interactions (multiplication) among them with proper
handling of feature names.
+
+**Examples**
+
+Assume that we have the following DataFrame with columns tree input column:
+
+~~~~
+
+id1 | id2 | id3
+----|-----|-----
+ 0 | 1 | 2
+ 1 | 4 | 3
+ 2 | 6 | 1
+ 3 | 10 | 8
+ 4 | 9 | 2
+ 5 | 1 | 1
+~~~~
+
+Applying `Interaction` with `id1`, `id2`, `id3` as the input columns,
+then `interactedCol` as the output column contains:
+
+~~~~
+id1 | id2 | id3 | interactedCol
+----|-----|-----|---------------
+ 0 | 1 | 2 | [0.0]
+ 1 | 4 | 3 | [0.0]
+ 2 | 6 | 1 | [12.0]
+ 3 | 10 | 8 | [240.0]
+ 4 | 9 | 2 | [72.0]
+ 5 | 1 | 1 | [5.0]
--- End diff --
This example doesn't really show what Interaction does. It looks like it
just outputs the product of the columns, but that's the corner case. Really it
outputs all possible products of one element from each column. Here's an rough
example:
```
val df = spark.createDataFrame(Seq(
(Vectors.dense(0,1,2,3),Vectors.dense(1,4,3,9)),
(Vectors.dense(2,6,1,7),Vectors.dense(3,10,8,11))
)).toDF("data1", "data2")
val assembler1 = new
VectorAssembler().setInputCols(Array("data1")).setOutputCol("vec1")
val assembler2 = new
VectorAssembler().setInputCols(Array("data2")).setOutputCol("vec2")
val interaction = new Interaction().setInputCols(Array("vec1",
"vec2")).setOutputCol("interactedCol")
interaction.transform(assembler2.transform(assembler1.transform(df))).select("vec1",
"vec2", "interactedCol").show(truncate = false)
+-----------------+-------------------+------------------------------------------------------------------------------+
|vec1 |vec2 |interactedCol
|
+-----------------+-------------------+------------------------------------------------------------------------------+
|[0.0,1.0,2.0,3.0]|[1.0,4.0,3.0,9.0]
|[0.0,0.0,0.0,0.0,1.0,4.0,3.0,9.0,2.0,8.0,6.0,18.0,3.0,12.0,9.0,27.0] |
|[2.0,6.0,1.0,7.0]|[3.0,10.0,8.0,11.0]|[6.0,20.0,16.0,22.0,18.0,60.0,48.0,66.0,3.0,10.0,8.0,11.0,21.0,70.0,56.0,77.0]|
+-----------------+-------------------+------------------------------------------------------------------------------+
```
I think a useful example should show something more like this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]