[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
[ https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872427#comment-15872427 ] Deron Eriksson commented on SYSTEMML-1277: -- This is addressed by [PR397|https://github.com/apache/incubator-systemml/pull/397]. [~mwdus...@us.ibm.com] Could you resolve this issue if it works with your real-world data example? [~xwu0226] Mike hit this issue working on the SystemML Breast Cancer project which involves deep learning. See [PR347|https://github.com/apache/incubator-systemml/pull/347]. We recently updated SystemML from mllib.Vector to the newer ml.Vector. The fix is to simply support both formats. > DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices. > --- > > Key: SYSTEMML-1277 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1277 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Assignee: Deron Eriksson >Priority: Blocker > > Recently, we made the switch from the old {{mllib.Vector}} to the new > {{ml.Vector}} type. Unfortunately, this leaves us with the issue of no > longer recognizing DataFrames with {{mllib.Vector}} columns during > conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} > objects, (2) instead fall back on conversion to {{Frame}} objects, and then > (3) fail completely when the ensuing DML script is expecting to operated on > matrices. > Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, > sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the > following script will now fail (did not previously): > {code} > script = """ > # Scale images to [-1,1] > X = X / 255 > X = X * 2 - 1 > """ > outputs = ("X") > script = dml(script).input(X=X_df).output(*outputs) > X = ml.execute(script).get(*outputs) > X > {code} > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while validating script > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : > ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME > SCALAR > at > org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:415) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:386) > at > org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130) > at > org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567) > at > org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
[ https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872364#comment-15872364 ] Xin Wu commented on SYSTEMML-1277: -- Is this issue also for Deep Learning? > DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices. > --- > > Key: SYSTEMML-1277 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1277 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Assignee: Deron Eriksson >Priority: Blocker > > Recently, we made the switch from the old {{mllib.Vector}} to the new > {{ml.Vector}} type. Unfortunately, this leaves us with the issue of no > longer recognizing DataFrames with {{mllib.Vector}} columns during > conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} > objects, (2) instead fall back on conversion to {{Frame}} objects, and then > (3) fail completely when the ensuing DML script is expecting to operated on > matrices. > Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, > sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the > following script will now fail (did not previously): > {code} > script = """ > # Scale images to [-1,1] > X = X / 255 > X = X * 2 - 1 > """ > outputs = ("X") > script = dml(script).input(X=X_df).output(*outputs) > X = ml.execute(script).get(*outputs) > X > {code} > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while validating script > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : > ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME > SCALAR > at > org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:415) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:386) > at > org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130) > at > org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567) > at > org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
[ https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870577#comment-15870577 ] Mike Dusenberry commented on SYSTEMML-1277: --- Adding the following fixes the issue, so we should just add the similar wrappers at the Java MLContext layer. {code} # Convert DataFrame columns of type `mllib.Vector` to type `ml.Vector` X_df = MLUtils.convertVectorColumnsToML(X_df) {code} > DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices. > --- > > Key: SYSTEMML-1277 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1277 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > Recently, we made the switch from the old {{mllib.Vector}} to the new > {{ml.Vector}} type. Unfortunately, this leaves us with the issue of no > longer recognizing DataFrames with {{mllib.Vector}} columns during > conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} > objects, (2) instead fall back on conversion to {{Frame}} objects, and then > (3) fail completely when the ensuing DML script is expecting to operated on > matrices. > Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, > sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the > following script will now fail (did not previously): > {code} > script = """ > # Scale images to [-1,1] > X = X / 255 > X = X * 2 - 1 > """ > outputs = ("X") > script = dml(script).input(X=X_df).output(*outputs) > X = ml.execute(script).get(*outputs) > X > {code} > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while validating script > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : > ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME > SCALAR > at > org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:415) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:386) > at > org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130) > at > org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567) > at > org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
[ https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870565#comment-15870565 ] Mike Dusenberry commented on SYSTEMML-1277: --- Update: Here's the official word on DataFrame conversions from the old {{mllib.Vector}} to {{ml.Vector}}: https://spark.apache.org/docs/2.0.0/ml-guide.html#breaking-changes. > DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices. > --- > > Key: SYSTEMML-1277 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1277 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > Recently, we made the switch from the old {{mllib.Vector}} to the new > {{ml.Vector}} type. Unfortunately, this leaves us with the issue of no > longer recognizing DataFrames with {{mllib.Vector}} columns during > conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} > objects, (2) instead fall back on conversion to {{Frame}} objects, and then > (3) fail completely when the ensuing DML script is expecting to operated on > matrices. > Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, > sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the > following script will now fail (did not previously): > {code} > script = """ > # Scale images to [-1,1] > X = X / 255 > X = X * 2 - 1 > """ > outputs = ("X") > script = dml(script).input(X=X_df).output(*outputs) > X = ml.execute(script).get(*outputs) > X > {code} > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while validating script > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : > ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME > SCALAR > at > org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:415) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:386) > at > org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130) > at > org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567) > at > org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
[ https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870536#comment-15870536 ] Mike Dusenberry commented on SYSTEMML-1277: --- Also, just to follow up, the {{ml.Vector}} type should remain the standard default, as Spark is moving away from {{mllib.Vector}}. However, since DataFrames created and saved with {{mllib.Vector}} types can still be used (and often without the user realizing that a saved DataFrame would maintain a distinct separation between the two types), it's plausible that a user will try to run the same SystemML code with the same DataFrame as before, and thus run into issues now. We could just catch any {{mllib.Vector}} types and convert to {{ml.Vector}} with {{mllib.Vector.asML}} which does not make any copy of the data --> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Vector. > DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices. > --- > > Key: SYSTEMML-1277 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1277 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > Recently, we made the switch from the old {{mllib.Vector}} to the new > {{ml.Vector}} type. Unfortunately, this leaves us with the issue of no > longer recognizing DataFrames with {{mllib.Vector}} columns during > conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} > objects, (2) instead fall back on conversion to {{Frame}} objects, and then > (3) fail completely when the ensuing DML script is expecting to operated on > matrices. > Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, > sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the > following script will now fail (did not previously): > {code} > script = """ > # Scale images to [-1,1] > X = X / 255 > X = X * 2 - 1 > """ > outputs = ("X") > script = dml(script).input(X=X_df).output(*outputs) > X = ml.execute(script).get(*outputs) > X > {code} > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while validating script > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : > ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME > SCALAR > at > org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:415) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:386) > at > org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130) > at > org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567) > at > org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
[ https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870515#comment-15870515 ] Mike Dusenberry commented on SYSTEMML-1277: --- cc [~deron] > DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices. > --- > > Key: SYSTEMML-1277 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1277 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > Recently, we made the switch from the old {{mllib.Vector}} to the new > {{ml.Vector}} type. Unfortunately, this leaves us with the issue of no > longer recognizing DataFrames with {{mllib.Vector}} columns during > conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} > objects, (2) instead fall back on conversion to {{Frame}} objects, and then > (3) fail completely when the ensuing DML script is expecting to operated on > matrices. > Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, > sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the > following script will now fail (did not previously): > {code} > script = """ > # Scale images to [-1,1] > X = X / 255 > X = X * 2 - 1 > """ > outputs = ("X") > script = dml(script).input(X=X_df).output(*outputs) > X = ml.execute(script).get(*outputs) > X > {code} > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while validating script > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : > ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME > SCALAR > at > org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:415) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:386) > at > org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130) > at > org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567) > at > org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)