[jira] [Commented] (SPARK-31144) Wrap java.lang.Error with an exception for QueryExecutionListener.onFailure

2021-01-12 Thread Alex Vayda (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263425#comment-17263425
 ] 

Alex Vayda commented on SPARK-31144:


Don't you think that wrapping an {{Error}} into {{Exception}}, just to be able 
to pass it into the method that, strictly speaking, doesn't expect to be called 
with an {{Error}}, would break the method semantics?

Wouldn't it be better to introduce another (third) method, say `onFatal(..., 
th: Throwable)` with an empty default implementation (for API backward 
compatibility), that would be called on errors, that are considered to be fatal 
from the Java/Scala perspective? See 
https://www.scala-lang.org/api/2.12.0/scala/util/control/NonFatal$.html

> Wrap java.lang.Error with an exception for QueryExecutionListener.onFailure
> ---
>
> Key: SPARK-31144
> URL: https://issues.apache.org/jira/browse/SPARK-31144
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 2.4.6, 3.0.0
>
>
> SPARK-28556 changed the method QueryExecutionListener.onFailure to allow 
> Spark sending java.lang.Error to this method. As this change breaks APIs, we 
> cannot fix branch-2.4.
> [~marmbrus] suggested to wrap java.lang.Error with an exception instead to 
> avoid a breaking change. A bonus of this solution is we can also fix the 
> issue (if a query throws java.lang.Error, QueryExecutionListener doesn't get 
> notified) in branch-2.4.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24647) Sink Should Return OffsetSeqs For ProgressReporting

2018-06-25 Thread Alex Vayda (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Vayda updated SPARK-24647:
---
Target Version/s: 2.4.0

> Sink Should Return OffsetSeqs For ProgressReporting
> ---
>
> Key: SPARK-24647
> URL: https://issues.apache.org/jira/browse/SPARK-24647
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Vaclav Kosar
>Priority: Major
> Fix For: 2.4.0
>
>
> To be able to track data lineage for Structured Streaming (I intend to 
> implement this to Open Source Project Spline), the monitoring needs to be 
> able to not only to track where the data was read from but also where results 
> were written to. This could be to my knowledge best implemented using 
> monitoring {{StreamingQueryProgress}}. However currently batch data offsets 
> are not available on {{Sink}} interface. Implementing as proposed would also 
> bring symmetry to {{StreamingQueryProgress}} fields sources and sink.
>  
> *Similar Proposals*
> Made in following jiras. These would not be sufficient for lineage tracking.
>  * https://issues.apache.org/jira/browse/SPARK-18258
>  * https://issues.apache.org/jira/browse/SPARK-21313
>  
> *Current State*
>  * Method {{Sink#addBatch}} returns {{Unit}}.
>  * {{StreamingQueryProgress}} reports {{offsetSeq}} start and end using 
> {{sourceProgress}} value but {{sinkProgress}} only calls {{toString}} method.
> {code:java}
>   "sources" : [ {
>     "description" : "KafkaSource[Subscribe[test-topic]]",
>     "startOffset" : null,
>     "endOffset" : { "test-topic" : { "0" : 5000 }},
>     "numInputRows" : 5000,
>     "processedRowsPerSecond" : 645.3278265358803
>   } ],
>   "sink" : {
>     "description" : 
> "org.apache.spark.sql.execution.streaming.ConsoleSink@9da556f"
>   }
> {code}
>  
>  
> *Proposed State*
>  * {{Sink#addBatch}} to return {{OffsetSeq}} or {{StreamProgress}} specifying 
> offsets of the written batch, e.g. Kafka does it by returning 
> {{RecordMetadata}} object from {{send}} method.
>  * {{StreamingQueryProgress}} incorporate {{sinkProgress}} in similar fashion 
> as {{sourceProgress}}.
>  
>  
> {code:java}
>   "sources" : [ {
>     "description" : "KafkaSource[Subscribe[test-topic]]",
>     "startOffset" : null,
>     "endOffset" : { "test-topic" : { "0" : 5000 }},
>     "numInputRows" : 5000,
>     "processedRowsPerSecond" : 645.3278265358803
>   } ],
>   "sink" : {
>     "description" : 
> "org.apache.spark.sql.execution.streaming.ConsoleSink@9da556f",
>    "startOffset" : null,
>     "endOffset" { "sinkTopic": { "0": 333 }}
>   }
> {code}
>  
> *Implementation*
> * PR submitters: Likely will be me and [~wajda] as soon as the discussion 
> ends positively. 
>  * {{Sinks}}: Modify all sinks to conform a new interface or return dummy 
> values.
>  * {{ProgressReporter}}: Merge offsets from different batches properly, 
> similarly to how it is done for sources.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24647) Sink Should Return OffsetSeqs For ProgressReporting

2018-06-25 Thread Alex Vayda (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Vayda updated SPARK-24647:
---
Affects Version/s: (was: 2.4.0)
   2.3.1

> Sink Should Return OffsetSeqs For ProgressReporting
> ---
>
> Key: SPARK-24647
> URL: https://issues.apache.org/jira/browse/SPARK-24647
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Vaclav Kosar
>Priority: Major
> Fix For: 2.4.0
>
>
> To be able to track data lineage for Structured Streaming (I intend to 
> implement this to Open Source Project Spline), the monitoring needs to be 
> able to not only to track where the data was read from but also where results 
> were written to. This could be to my knowledge best implemented using 
> monitoring {{StreamingQueryProgress}}. However currently batch data offsets 
> are not available on {{Sink}} interface. Implementing as proposed would also 
> bring symmetry to {{StreamingQueryProgress}} fields sources and sink.
>  
> *Similar Proposals*
> Made in following jiras. These would not be sufficient for lineage tracking.
>  * https://issues.apache.org/jira/browse/SPARK-18258
>  * https://issues.apache.org/jira/browse/SPARK-21313
>  
> *Current State*
>  * Method {{Sink#addBatch}} returns {{Unit}}.
>  * {{StreamingQueryProgress}} reports {{offsetSeq}} start and end using 
> {{sourceProgress}} value but {{sinkProgress}} only calls {{toString}} method.
> {code:java}
>   "sources" : [ {
>     "description" : "KafkaSource[Subscribe[test-topic]]",
>     "startOffset" : null,
>     "endOffset" : { "test-topic" : { "0" : 5000 }},
>     "numInputRows" : 5000,
>     "processedRowsPerSecond" : 645.3278265358803
>   } ],
>   "sink" : {
>     "description" : 
> "org.apache.spark.sql.execution.streaming.ConsoleSink@9da556f"
>   }
> {code}
>  
>  
> *Proposed State*
>  * {{Sink#addBatch}} to return {{OffsetSeq}} or {{StreamProgress}} specifying 
> offsets of the written batch, e.g. Kafka does it by returning 
> {{RecordMetadata}} object from {{send}} method.
>  * {{StreamingQueryProgress}} incorporate {{sinkProgress}} in similar fashion 
> as {{sourceProgress}}.
>  
>  
> {code:java}
>   "sources" : [ {
>     "description" : "KafkaSource[Subscribe[test-topic]]",
>     "startOffset" : null,
>     "endOffset" : { "test-topic" : { "0" : 5000 }},
>     "numInputRows" : 5000,
>     "processedRowsPerSecond" : 645.3278265358803
>   } ],
>   "sink" : {
>     "description" : 
> "org.apache.spark.sql.execution.streaming.ConsoleSink@9da556f",
>    "startOffset" : null,
>     "endOffset" { "sinkTopic": { "0": 333 }}
>   }
> {code}
>  
> *Implementation*
> * PR submitters: Likely will be me and [~wajda] as soon as the discussion 
> ends positively. 
>  * {{Sinks}}: Modify all sinks to conform a new interface or return dummy 
> values.
>  * {{ProgressReporter}}: Merge offsets from different batches properly, 
> similarly to how it is done for sources.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24647) Sink Should Return OffsetSeqs For ProgressReporting

2018-06-25 Thread Alex Vayda (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Vayda updated SPARK-24647:
---
Fix Version/s: 2.4.0

> Sink Should Return OffsetSeqs For ProgressReporting
> ---
>
> Key: SPARK-24647
> URL: https://issues.apache.org/jira/browse/SPARK-24647
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Vaclav Kosar
>Priority: Major
> Fix For: 2.4.0
>
>
> To be able to track data lineage for Structured Streaming (I intend to 
> implement this to Open Source Project Spline), the monitoring needs to be 
> able to not only to track where the data was read from but also where results 
> were written to. This could be to my knowledge best implemented using 
> monitoring {{StreamingQueryProgress}}. However currently batch data offsets 
> are not available on {{Sink}} interface. Implementing as proposed would also 
> bring symmetry to {{StreamingQueryProgress}} fields sources and sink.
>  
> *Similar Proposals*
> Made in following jiras. These would not be sufficient for lineage tracking.
>  * https://issues.apache.org/jira/browse/SPARK-18258
>  * https://issues.apache.org/jira/browse/SPARK-21313
>  
> *Current State*
>  * Method {{Sink#addBatch}} returns {{Unit}}.
>  * {{StreamingQueryProgress}} reports {{offsetSeq}} start and end using 
> {{sourceProgress}} value but {{sinkProgress}} only calls {{toString}} method.
> {code:java}
>   "sources" : [ {
>     "description" : "KafkaSource[Subscribe[test-topic]]",
>     "startOffset" : null,
>     "endOffset" : { "test-topic" : { "0" : 5000 }},
>     "numInputRows" : 5000,
>     "processedRowsPerSecond" : 645.3278265358803
>   } ],
>   "sink" : {
>     "description" : 
> "org.apache.spark.sql.execution.streaming.ConsoleSink@9da556f"
>   }
> {code}
>  
>  
> *Proposed State*
>  * {{Sink#addBatch}} to return {{OffsetSeq}} or {{StreamProgress}} specifying 
> offsets of the written batch, e.g. Kafka does it by returning 
> {{RecordMetadata}} object from {{send}} method.
>  * {{StreamingQueryProgress}} incorporate {{sinkProgress}} in similar fashion 
> as {{sourceProgress}}.
>  
>  
> {code:java}
>   "sources" : [ {
>     "description" : "KafkaSource[Subscribe[test-topic]]",
>     "startOffset" : null,
>     "endOffset" : { "test-topic" : { "0" : 5000 }},
>     "numInputRows" : 5000,
>     "processedRowsPerSecond" : 645.3278265358803
>   } ],
>   "sink" : {
>     "description" : 
> "org.apache.spark.sql.execution.streaming.ConsoleSink@9da556f",
>    "startOffset" : null,
>     "endOffset" { "sinkTopic": { "0": 333 }}
>   }
> {code}
>  
> *Implementation*
> * PR submitters: Likely will be me and [~wajda] as soon as the discussion 
> ends positively. 
>  * {{Sinks}}: Modify all sinks to conform a new interface or return dummy 
> values.
>  * {{ProgressReporter}}: Merge offsets from different batches properly, 
> similarly to how it is done for sources.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24647) Sink Should Return OffsetSeqs For ProgressReporting

2018-06-25 Thread Alex Vayda (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Vayda updated SPARK-24647:
---
Description: 
To be able to track data lineage for Structured Streaming (I intend to 
implement this to Open Source Project Spline), the monitoring needs to be able 
to not only to track where the data was read from but also where results were 
written to. This could be to my knowledge best implemented using monitoring 
{{StreamingQueryProgress}}. However currently batch data offsets are not 
available on {{Sink}} interface. Implementing as proposed would also bring 
symmetry to {{StreamingQueryProgress}} fields sources and sink.

 

*Similar Proposals*

Made in following jiras. These would not be sufficient for lineage tracking.
 * https://issues.apache.org/jira/browse/SPARK-18258
 * https://issues.apache.org/jira/browse/SPARK-21313

 

*Current State*
 * Method {{Sink#addBatch}} returns {{Unit}}.
 * {{StreamingQueryProgress}} reports {{offsetSeq}} start and end using 
{{sourceProgress}} value but {{sinkProgress}} only calls {{toString}} method.

{code:java}
  "sources" : [ {
    "description" : "KafkaSource[Subscribe[test-topic]]",
    "startOffset" : null,
    "endOffset" : { "test-topic" : { "0" : 5000 }},
    "numInputRows" : 5000,
    "processedRowsPerSecond" : 645.3278265358803
  } ],
  "sink" : {
    "description" : 
"org.apache.spark.sql.execution.streaming.ConsoleSink@9da556f"
  }
{code}
 

 

*Proposed State*
 * {{Sink#addBatch}} to return {{OffsetSeq}} or {{StreamProgress}} specifying 
offsets of the written batch, e.g. Kafka does it by returning 
{{RecordMetadata}} object from {{send}} method.
 * {{StreamingQueryProgress}} incorporate {{sinkProgress}} in similar fashion 
as {{sourceProgress}}.

 

 
{code:java}
  "sources" : [ {
    "description" : "KafkaSource[Subscribe[test-topic]]",
    "startOffset" : null,
    "endOffset" : { "test-topic" : { "0" : 5000 }},
    "numInputRows" : 5000,
    "processedRowsPerSecond" : 645.3278265358803
  } ],
  "sink" : {
    "description" : 
"org.apache.spark.sql.execution.streaming.ConsoleSink@9da556f",
   "startOffset" : null,
    "endOffset" { "sinkTopic": { "0": 333 }}
  }
{code}
 

*Implementation*
 * {{Sinks}}: Modify all sinks to conform a new interface or return dummy 
values.
 * {{ProgressReporter}}: Merge offsets from different batches properly, 
similarly to how it is done for sources.

 

  was:
To be able to track data lineage for Structured Streaming (I intend to 
implement this to Open Source Project Spline), the monitoring needs to be able 
to not only to track where the data was read from but also where results were 
written to. This could be to my knowledge best implemented using monitoring 
{{StreamingQueryProgress}}. However currently batch data offsets are not 
available on {{Sink}} interface. Implementing as proposed would also bring 
symmetry to {{StreamingQueryProgress}} fields sources and sink.

 

*Similar Proposals*

Made in following jiras. These would not be sufficient for lineage tracking.
 * https://issues.apache.org/jira/browse/SPARK-18258
 * https://issues.apache.org/jira/browse/SPARK-21313

 

*Current State*
 * Method {{Sink#addBatch}} returns {{Unit}}.
 * {{StreamingQueryProgress}} reports {{offsetSeq}} start and end using 
{{sourceProgress}} value but {{sinkProgress}} only calls {{toString}} method.

{code:java}
  "sources" : [ {
    "description" : "KafkaSource[Subscribe[test-topic]]",
    "startOffset" : null,
    "endOffset" : { "test-topic" : { "0" : 5000 }},
    "numInputRows" : 5000,
    "processedRowsPerSecond" : 645.3278265358803
  } ],
  "sink" : {
    "description" : 
"org.apache.spark.sql.execution.streaming.ConsoleSink@9da556f"
  }
{code}
 

 

*Proposed State*
 * Sink#addBatch to return OffsetSeq or StreamProgress specifying offsets of 
the written batch e.g. Kafka returns this from send method in RecordMetadata 
object.
 * StreamingQueryProgress incorporate sinkProgress in similar fashion as 
sourceProgress.

 

 
{code:java}
  "sources" : [ {
    "description" : "KafkaSource[Subscribe[test-topic]]",
    "startOffset" : null,
    "endOffset" : { "test-topic" : { "0" : 5000 }},
    "numInputRows" : 5000,
    "processedRowsPerSecond" : 645.3278265358803
  } ],
  "sink" : {
    "description" : 
"org.apache.spark.sql.execution.streaming.ConsoleSink@9da556f",
   "startOffset" : null,
    "endOffset" { "sinkTopic": { "0": 333 }}
  }
{code}
 

*Implementation*
 * Sinks: Modify all Sinks to conform new interface or return dummy values.
 * ProgressReporter: Merge Offsets from different batches properly similarly as 
is done for sources.

 


> Sink Should Return OffsetSeqs For ProgressReporting
> ---
>
> Key: SPARK-24647
> URL: https://issues.apache.org/jira/browse/SPARK-24647
> Project: Spark
>  Issue Type: Improvement
>   

[jira] [Updated] (SPARK-24647) Sink Should Return OffsetSeqs For ProgressReporting

2018-06-25 Thread Alex Vayda (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Vayda updated SPARK-24647:
---
Description: 
To be able to track data lineage for Structured Streaming (I intend to 
implement this to Open Source Project Spline), the monitoring needs to be able 
to not only to track where the data was read from but also where results were 
written to. This could be to my knowledge best implemented using monitoring 
{{StreamingQueryProgress}}. However currently batch data offsets are not 
available on {{Sink}} interface. Implementing as proposed would also bring 
symmetry to {{StreamingQueryProgress}} fields sources and sink.

 

*Similar Proposals*

Made in following jiras. These would not be sufficient for lineage tracking.
 * https://issues.apache.org/jira/browse/SPARK-18258
 * https://issues.apache.org/jira/browse/SPARK-21313

 

*Current State*
 * Method {{Sink#addBatch}} returns {{Unit}}.
 * {{StreamingQueryProgress}} reports {{offsetSeq}} start and end using 
{{sourceProgress}} value but {{sinkProgress}} only calls {{toString}} method.

{code:java}
  "sources" : [ {
    "description" : "KafkaSource[Subscribe[test-topic]]",
    "startOffset" : null,
    "endOffset" : { "test-topic" : { "0" : 5000 }},
    "numInputRows" : 5000,
    "processedRowsPerSecond" : 645.3278265358803
  } ],
  "sink" : {
    "description" : 
"org.apache.spark.sql.execution.streaming.ConsoleSink@9da556f"
  }
{code}
 

 

*Proposed State*
 * Sink#addBatch to return OffsetSeq or StreamProgress specifying offsets of 
the written batch e.g. Kafka returns this from send method in RecordMetadata 
object.
 * StreamingQueryProgress incorporate sinkProgress in similar fashion as 
sourceProgress.

 

 
{code:java}
  "sources" : [ {
    "description" : "KafkaSource[Subscribe[test-topic]]",
    "startOffset" : null,
    "endOffset" : { "test-topic" : { "0" : 5000 }},
    "numInputRows" : 5000,
    "processedRowsPerSecond" : 645.3278265358803
  } ],
  "sink" : {
    "description" : 
"org.apache.spark.sql.execution.streaming.ConsoleSink@9da556f",
   "startOffset" : null,
    "endOffset" { "sinkTopic": { "0": 333 }}
  }
{code}
 

*Implementation*
 * Sinks: Modify all Sinks to conform new interface or return dummy values.
 * ProgressReporter: Merge Offsets from different batches properly similarly as 
is done for sources.

 

  was:
To be able to track data lineage for Structured Streaming (I intend to 
implement this to [Open Source Project 
Spline|https://absaoss.github.io/spline/]), the monitoring needs to be able to 
not only to track where the data was read from but also where results were 
written to. This could be to my knowledge best implemented using monitoring 
StreamingQueryProgress. However currently batch data offsets are not available 
on Sink interface. Implementing as proposed would also bring symmetry to 
StreamingQueryProgress fields sources and sink.

 

*Similar Proposals*

Made in following jiras. These would not be sufficient for lineage tracking.
 * https://issues.apache.org/jira/browse/SPARK-18258
 * https://issues.apache.org/jira/browse/SPARK-21313

 

*Current State*
 * Method Sink#addBatch returns Unit.
 * StreamingQueryProgress reports offsetSeq start and end using sourceProgress 
value but sinkProgress calls only toString.

{code:java}
  "sources" : [ {
    "description" : "KafkaSource[Subscribe[test-topic]]",
    "startOffset" : null,
    "endOffset" : { "test-topic" : { "0" : 5000 }},
    "numInputRows" : 5000,
    "processedRowsPerSecond" : 645.3278265358803
  } ],
  "sink" : {
    "description" : 
"org.apache.spark.sql.execution.streaming.ConsoleSink@9da556f"
  }
{code}
 

 

*Proposed State*
 * Sink#addBatch to return OffsetSeq or StreamProgress specifying offsets of 
the written batch e.g. Kafka returns this from send method in RecordMetadata 
object.
 * StreamingQueryProgress incorporate sinkProgress in similar fashion as 
sourceProgress.

 

 
{code:java}
  "sources" : [ {
    "description" : "KafkaSource[Subscribe[test-topic]]",
    "startOffset" : null,
    "endOffset" : { "test-topic" : { "0" : 5000 }},
    "numInputRows" : 5000,
    "processedRowsPerSecond" : 645.3278265358803
  } ],
  "sink" : {
    "description" : 
"org.apache.spark.sql.execution.streaming.ConsoleSink@9da556f",
   "startOffset" : null,
    "endOffset" { "sinkTopic": { "0": 333 }}
  }
{code}
 

*Implementation*
 * PR submitters: Likely me and [~wajda] as soon as it is discussed here.
 * Sinks: Modify all Sinks to conform new interface or return dummy values.
 * ProgressReporter: Merge Offsets from different batches properly similarly as 
is done for sources.

 


> Sink Should Return OffsetSeqs For ProgressReporting
> ---
>
> Key: SPARK-24647
> URL: https://issues.apache.org/jira/browse/SPARK-24647
> Project: Spark
>  Issue Type: 

[jira] [Resolved] (SPARK-24350) ClassCastException in "array_position" function

2018-05-24 Thread Alex Vayda (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Vayda resolved SPARK-24350.

Resolution: Fixed

> ClassCastException in "array_position" function
> ---
>
> Key: SPARK-24350
> URL: https://issues.apache.org/jira/browse/SPARK-24350
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Alex Vayda
>Priority: Major
> Fix For: 2.4.0
>
>
> When calling {{array_position}} function with a wrong type of the 1st operand 
> a {{ClassCastException}} is thrown instead of {{AnalysisException}}
> Example:
> {code:sql}
> select array_position('foo', 'bar')
> {code}
> {noformat}
> java.lang.ClassCastException: org.apache.spark.sql.types.StringType$ cannot 
> be cast to org.apache.spark.sql.types.ArrayType
>   at 
> org.apache.spark.sql.catalyst.expressions.ArrayPosition.inputTypes(collectionOperations.scala:1398)
>   at 
> org.apache.spark.sql.catalyst.expressions.ExpectsInputTypes$class.checkInputDataTypes(ExpectsInputTypes.scala:44)
>   at 
> org.apache.spark.sql.catalyst.expressions.ArrayPosition.checkInputDataTypes(collectionOperations.scala:1401)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.resolved$lzycompute(Expression.scala:168)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.resolved(Expression.scala:168)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveAliases$$assignAliases$1$$anonfun$apply$3.applyOrElse(Analyzer.scala:256)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveAliases$$assignAliases$1$$anonfun$apply$3.applyOrElse(Analyzer.scala:252)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23899) Built-in SQL Function Improvement

2018-05-22 Thread Alex Vayda (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484477#comment-16484477
 ] 

Alex Vayda edited comment on SPARK-23899 at 5/22/18 7:56 PM:
-

What do you guys think about adding another set of convenient functions for 
working with multi-dimensional arrays? E.g. matrix operations like 
{{transpose}}, {{multiply}} and others?
Something similar to {{ml.linalg.Matrix}}


was (Author: wajda):
What do you guys think about adding another set of convenient functions for 
working with multi-dimentional arrays? E.g. matrix operations like 
{{transpose}}, {{multiply}} and others?
Something similar to {{ml.linalg.Matrix}}

> Built-in SQL Function Improvement
> -
>
> Key: SPARK-23899
> URL: https://issues.apache.org/jira/browse/SPARK-23899
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
> Fix For: 2.4.0
>
>
> This umbrella JIRA is to improve compatibility with the other data processing 
> systems, including Hive, Teradata, Presto, Postgres, MySQL, DB2, Oracle, and 
> MS SQL Server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23899) Built-in SQL Function Improvement

2018-05-22 Thread Alex Vayda (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484477#comment-16484477
 ] 

Alex Vayda commented on SPARK-23899:


What do you guys think about adding another set of convenient functions for 
working with multi-dimentional arrays? E.g. matrix operations like 
{{transpose}}, {{multiply}} and others?
Something similar to {{ml.linalg.Matrix}}

> Built-in SQL Function Improvement
> -
>
> Key: SPARK-23899
> URL: https://issues.apache.org/jira/browse/SPARK-23899
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
> Fix For: 2.4.0
>
>
> This umbrella JIRA is to improve compatibility with the other data processing 
> systems, including Hive, Teradata, Presto, Postgres, MySQL, DB2, Oracle, and 
> MS SQL Server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org