[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.

2017-02-17 Thread Deron Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872427#comment-15872427
 ] 

Deron Eriksson commented on SYSTEMML-1277:
--

This is addressed by 
[PR397|https://github.com/apache/incubator-systemml/pull/397].

[~mwdus...@us.ibm.com] Could you resolve this issue if it works with your 
real-world data example?

[~xwu0226] Mike hit this issue working on the SystemML Breast Cancer project 
which involves deep learning. See 
[PR347|https://github.com/apache/incubator-systemml/pull/347]. We recently 
updated SystemML from mllib.Vector to the newer ml.Vector. The fix is to simply 
support both formats.

> DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
> ---
>
> Key: SYSTEMML-1277
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1277
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.13
>Reporter: Mike Dusenberry
>Assignee: Deron Eriksson
>Priority: Blocker
>
> Recently, we made the switch from the old {{mllib.Vector}} to the new 
> {{ml.Vector}} type.  Unfortunately, this leaves us with the issue of no 
> longer recognizing DataFrames with {{mllib.Vector}} columns during 
> conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} 
> objects, (2) instead fall back on conversion to {{Frame}} objects, and then 
> (3) fail completely when the ensuing DML script is expecting to operated on 
> matrices.
> Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, 
> sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the 
> following script will now fail (did not previously):
> {code}
> script = """
> # Scale images to [-1,1]
> X = X / 255
> X = X * 2 - 1
> """
> outputs = ("X")
> script = dml(script).input(X=X_df).output(*outputs)
> X = ml.execute(script).get(*outputs)
> X
> {code}
> {code}
> Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
> occurred while validating script
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280)
>   at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293)
>   ... 12 more
> Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : 
> ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME 
> SCALAR
>   at 
> org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:415)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:386)
>   at 
> org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130)
>   at 
> org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567)
>   at 
> org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485)
>   ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.

2017-02-17 Thread Xin Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872364#comment-15872364
 ] 

Xin Wu commented on SYSTEMML-1277:
--

Is this issue also for Deep Learning?

> DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
> ---
>
> Key: SYSTEMML-1277
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1277
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.13
>Reporter: Mike Dusenberry
>Assignee: Deron Eriksson
>Priority: Blocker
>
> Recently, we made the switch from the old {{mllib.Vector}} to the new 
> {{ml.Vector}} type.  Unfortunately, this leaves us with the issue of no 
> longer recognizing DataFrames with {{mllib.Vector}} columns during 
> conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} 
> objects, (2) instead fall back on conversion to {{Frame}} objects, and then 
> (3) fail completely when the ensuing DML script is expecting to operated on 
> matrices.
> Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, 
> sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the 
> following script will now fail (did not previously):
> {code}
> script = """
> # Scale images to [-1,1]
> X = X / 255
> X = X * 2 - 1
> """
> outputs = ("X")
> script = dml(script).input(X=X_df).output(*outputs)
> X = ml.execute(script).get(*outputs)
> X
> {code}
> {code}
> Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
> occurred while validating script
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280)
>   at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293)
>   ... 12 more
> Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : 
> ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME 
> SCALAR
>   at 
> org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:415)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:386)
>   at 
> org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130)
>   at 
> org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567)
>   at 
> org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485)
>   ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.

2017-02-16 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870577#comment-15870577
 ] 

Mike Dusenberry commented on SYSTEMML-1277:
---

Adding the following fixes the issue, so we should just add the similar 
wrappers at the Java MLContext layer.

{code}
# Convert DataFrame columns of type `mllib.Vector` to type `ml.Vector`
X_df = MLUtils.convertVectorColumnsToML(X_df)
{code}

> DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
> ---
>
> Key: SYSTEMML-1277
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1277
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.13
>Reporter: Mike Dusenberry
>Priority: Blocker
>
> Recently, we made the switch from the old {{mllib.Vector}} to the new 
> {{ml.Vector}} type.  Unfortunately, this leaves us with the issue of no 
> longer recognizing DataFrames with {{mllib.Vector}} columns during 
> conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} 
> objects, (2) instead fall back on conversion to {{Frame}} objects, and then 
> (3) fail completely when the ensuing DML script is expecting to operated on 
> matrices.
> Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, 
> sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the 
> following script will now fail (did not previously):
> {code}
> script = """
> # Scale images to [-1,1]
> X = X / 255
> X = X * 2 - 1
> """
> outputs = ("X")
> script = dml(script).input(X=X_df).output(*outputs)
> X = ml.execute(script).get(*outputs)
> X
> {code}
> {code}
> Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
> occurred while validating script
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280)
>   at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293)
>   ... 12 more
> Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : 
> ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME 
> SCALAR
>   at 
> org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:415)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:386)
>   at 
> org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130)
>   at 
> org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567)
>   at 
> org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485)
>   ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.

2017-02-16 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870565#comment-15870565
 ] 

Mike Dusenberry commented on SYSTEMML-1277:
---

Update: Here's the official word on DataFrame conversions from the old 
{{mllib.Vector}} to {{ml.Vector}}: 
https://spark.apache.org/docs/2.0.0/ml-guide.html#breaking-changes.

> DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
> ---
>
> Key: SYSTEMML-1277
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1277
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.13
>Reporter: Mike Dusenberry
>Priority: Blocker
>
> Recently, we made the switch from the old {{mllib.Vector}} to the new 
> {{ml.Vector}} type.  Unfortunately, this leaves us with the issue of no 
> longer recognizing DataFrames with {{mllib.Vector}} columns during 
> conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} 
> objects, (2) instead fall back on conversion to {{Frame}} objects, and then 
> (3) fail completely when the ensuing DML script is expecting to operated on 
> matrices.
> Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, 
> sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the 
> following script will now fail (did not previously):
> {code}
> script = """
> # Scale images to [-1,1]
> X = X / 255
> X = X * 2 - 1
> """
> outputs = ("X")
> script = dml(script).input(X=X_df).output(*outputs)
> X = ml.execute(script).get(*outputs)
> X
> {code}
> {code}
> Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
> occurred while validating script
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280)
>   at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293)
>   ... 12 more
> Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : 
> ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME 
> SCALAR
>   at 
> org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:415)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:386)
>   at 
> org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130)
>   at 
> org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567)
>   at 
> org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485)
>   ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.

2017-02-16 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870536#comment-15870536
 ] 

Mike Dusenberry commented on SYSTEMML-1277:
---

Also, just to follow up, the {{ml.Vector}} type should remain the standard 
default, as Spark is moving away from {{mllib.Vector}}.  However, since 
DataFrames created and saved with {{mllib.Vector}} types can still be used (and 
often without the user realizing that a saved DataFrame would maintain a 
distinct separation between the two types), it's plausible that a user will try 
to run the same SystemML code with the same DataFrame as before, and thus run 
into issues now.  We could just catch any {{mllib.Vector}} types and convert to 
{{ml.Vector}} with {{mllib.Vector.asML}} which does not make any copy of the 
data --> 
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Vector.

> DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
> ---
>
> Key: SYSTEMML-1277
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1277
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.13
>Reporter: Mike Dusenberry
>Priority: Blocker
>
> Recently, we made the switch from the old {{mllib.Vector}} to the new 
> {{ml.Vector}} type.  Unfortunately, this leaves us with the issue of no 
> longer recognizing DataFrames with {{mllib.Vector}} columns during 
> conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} 
> objects, (2) instead fall back on conversion to {{Frame}} objects, and then 
> (3) fail completely when the ensuing DML script is expecting to operated on 
> matrices.
> Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, 
> sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the 
> following script will now fail (did not previously):
> {code}
> script = """
> # Scale images to [-1,1]
> X = X / 255
> X = X * 2 - 1
> """
> outputs = ("X")
> script = dml(script).input(X=X_df).output(*outputs)
> X = ml.execute(script).get(*outputs)
> X
> {code}
> {code}
> Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
> occurred while validating script
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280)
>   at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293)
>   ... 12 more
> Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : 
> ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME 
> SCALAR
>   at 
> org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:415)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:386)
>   at 
> org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130)
>   at 
> org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567)
>   at 
> org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485)
>   ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.

2017-02-16 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870515#comment-15870515
 ] 

Mike Dusenberry commented on SYSTEMML-1277:
---

cc [~deron]

> DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
> ---
>
> Key: SYSTEMML-1277
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1277
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.13
>Reporter: Mike Dusenberry
>Priority: Blocker
>
> Recently, we made the switch from the old {{mllib.Vector}} to the new 
> {{ml.Vector}} type.  Unfortunately, this leaves us with the issue of no 
> longer recognizing DataFrames with {{mllib.Vector}} columns during 
> conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} 
> objects, (2) instead fall back on conversion to {{Frame}} objects, and then 
> (3) fail completely when the ensuing DML script is expecting to operated on 
> matrices.
> Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, 
> sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the 
> following script will now fail (did not previously):
> {code}
> script = """
> # Scale images to [-1,1]
> X = X / 255
> X = X * 2 - 1
> """
> outputs = ("X")
> script = dml(script).input(X=X_df).output(*outputs)
> X = ml.execute(script).get(*outputs)
> X
> {code}
> {code}
> Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
> occurred while validating script
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280)
>   at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293)
>   ... 12 more
> Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : 
> ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME 
> SCALAR
>   at 
> org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:415)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:386)
>   at 
> org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130)
>   at 
> org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567)
>   at 
> org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485)
>   ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)