[jira] [Updated] (SPARK-26393) Different behaviors of date_add when calling it inside expr
[ https://issues.apache.org/jira/browse/SPARK-26393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Kamal` updated SPARK-26393: - Description: When Calling date_add from pyspark.sql.functions directly without using expr, like this : {code:java} df.withColumn("added", F.date_add(F.to_date(F.lit('1998-9-26')), F.col('days'))).toPandas(){code} It will raise Error : `TypeError: Column is not iterable` because it only taking a number not a column but when i try to use it inside an expr, like this : {code:java} df.withColumn("added", F.expr("date_add(to_date('1998-9-26'), days)")).toPandas(){code} It will work fine. Shouldn't it behave the same way ? and i think its logical to accept a column here as well. A python Notebook to demonstrate : [https://gist.github.com/AhmedKamal20/fec10337e815baa44f115d307e3b07eb] was: When Calling date_add from pyspark.sql.functions directly without using expr, like this : {code:java} df.withColumn("added", F.date_add(F.to_date(F.lit('1998-9-26')), F.col('days'))).toPandas(){code} It will raise Error : `TypeError: Column is not iterable` because it only taking a number not a column but when i try to use it inside an expr, like this : {code:java} df.withColumn("added", F.expr("date_add(to_date('1998-9-26'), days)")).toPandas() {code} it will work fine. Shouldn't it behave the same way ? and i thin its logical to accept a column here as well. A python Notebook to demonstrate : https://gist.github.com/AhmedKamal20/fec10337e815baa44f115d307e3b07eb > Different behaviors of date_add when calling it inside expr > --- > > Key: SPARK-26393 > URL: https://issues.apache.org/jira/browse/SPARK-26393 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.2 >Reporter: Ahmed Kamal` >Priority: Minor > > When Calling date_add from pyspark.sql.functions directly without using expr, > like this : > {code:java} > df.withColumn("added", F.date_add(F.to_date(F.lit('1998-9-26')), > F.col('days'))).toPandas(){code} > It will raise Error : `TypeError: Column is not iterable` > because it only taking a number not a column > but when i try to use it inside an expr, like this : > {code:java} > df.withColumn("added", F.expr("date_add(to_date('1998-9-26'), > days)")).toPandas(){code} > It will work fine. > Shouldn't it behave the same way ? > and i think its logical to accept a column here as well. > A python Notebook to demonstrate : > [https://gist.github.com/AhmedKamal20/fec10337e815baa44f115d307e3b07eb] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26393) Different behaviors of date_add when calling it inside expr
Ahmed Kamal` created SPARK-26393: Summary: Different behaviors of date_add when calling it inside expr Key: SPARK-26393 URL: https://issues.apache.org/jira/browse/SPARK-26393 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.3.2 Reporter: Ahmed Kamal` When Calling date_add from pyspark.sql.functions directly without using expr, like this : {code:java} df.withColumn("added", F.date_add(F.to_date(F.lit('1998-9-26')), F.col('days'))).toPandas(){code} It will raise Error : `TypeError: Column is not iterable` because it only taking a number not a column but when i try to use it inside an expr, like this : {code:java} df.withColumn("added", F.expr("date_add(to_date('1998-9-26'), days)")).toPandas() {code} it will work fine. Shouldn't it behave the same way ? and i thin its logical to accept a column here as well. A python Notebook to demonstrate : https://gist.github.com/AhmedKamal20/fec10337e815baa44f115d307e3b07eb -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14516) Clustering evaluator
[ https://issues.apache.org/jira/browse/SPARK-14516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239652#comment-15239652 ] Ahmed Kamal commented on SPARK-14516: - I will go through the Mlib code to familiarize myself with its structure. Did we agree on the metrics that would be added ? [~podongfeng] Please let me know how would you share with me your current state/design ? A google document may be a good way I think > Clustering evaluator > > > Key: SPARK-14516 > URL: https://issues.apache.org/jira/browse/SPARK-14516 > Project: Spark > Issue Type: Brainstorming > Components: ML >Reporter: zhengruifeng >Priority: Minor > > MLlib does not have any general purposed clustering metrics with a ground > truth. > In > [Scikit-Learn](http://scikit-learn.org/stable/modules/classes.html#clustering-metrics), > there are several kinds of metrics for this. > It may be meaningful to add some clustering metrics into MLlib. > This should be added as a {{ClusteringEvaluator}} class of extending > {{Evaluator}} in spark.ml. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14516) What about adding general clustering metrics?
[ https://issues.apache.org/jira/browse/SPARK-14516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236736#comment-15236736 ] Ahmed Kamal commented on SPARK-14516: - [~srowen] I guess this could be a good starter issue for me in Spark. I can start working on silhouette if I got assigned. > What about adding general clustering metrics? > - > > Key: SPARK-14516 > URL: https://issues.apache.org/jira/browse/SPARK-14516 > Project: Spark > Issue Type: Brainstorming > Components: ML, MLlib >Reporter: zhengruifeng >Priority: Minor > > ML/MLLIB dont have any general purposed clustering metrics with a ground > truth. > In > [Scikit-Learn](http://scikit-learn.org/stable/modules/classes.html#clustering-metrics), > there are several kinds of metrics for this. > It may be meaningful to add some clustering metrics into ML/MLLIB. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13769) Java Doc needs update in SparkSubmit.scala
[ https://issues.apache.org/jira/browse/SPARK-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186689#comment-15186689 ] Ahmed Kamal commented on SPARK-13769: - I have created a pull request to fix this issue https://github.com/apache/spark/pull/11600 > Java Doc needs update in SparkSubmit.scala > -- > > Key: SPARK-13769 > URL: https://issues.apache.org/jira/browse/SPARK-13769 > Project: Spark > Issue Type: Bug >Reporter: Ahmed Kamal >Priority: Minor > > The java doc here > (https://github.com/apache/spark/blob/e97fc7f176f8bf501c9b3afd8410014e3b0e1602/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L51) > needs to be updated from "The latter two operations are currently supported > only for standalone cluster mode." to "The latter two operations are > currently supported only for standalone and mesos cluster mode." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13769) Java Doc needs update in SparkSubmit.scala
Ahmed Kamal created SPARK-13769: --- Summary: Java Doc needs update in SparkSubmit.scala Key: SPARK-13769 URL: https://issues.apache.org/jira/browse/SPARK-13769 Project: Spark Issue Type: Bug Reporter: Ahmed Kamal Priority: Minor The java doc here (https://github.com/apache/spark/blob/e97fc7f176f8bf501c9b3afd8410014e3b0e1602/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L51) needs to be updated from "The latter two operations are currently supported only for standalone cluster mode." to "The latter two operations are currently supported only for standalone and mesos cluster mode." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12528) Make Apache Spark’s gateway hidden REST API (in standalone cluster mode) public API
[ https://issues.apache.org/jira/browse/SPARK-12528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175619#comment-15175619 ] Ahmed Kamal edited comment on SPARK-12528 at 3/2/16 1:54 PM: - As mentioned in this issue design document (https://issues.apache.org/jira/browse/SPARK-5338) , REST API is supporting MESOS too . Why don't Spark also make the API support YARN (YARN already has a REST API for jobs submission and monitoring so I can imagine this shouldn't be difficult) so it become a standard way to submit jobs in a language independent & cluster independent way. was (Author: akamal): As mentioned in this issue design document , REST API is supporting MESOS too (https://issues.apache.org/jira/browse/SPARK-5338). Why don't Spark also make the API support YARN (YARN already has a REST API for jobs submission and monitoring so I can imagine this shouldn't be difficult) so it become a standard way to submit jobs in a language independent & cluster independent way. > Make Apache Spark’s gateway hidden REST API (in standalone cluster mode) > public API > --- > > Key: SPARK-12528 > URL: https://issues.apache.org/jira/browse/SPARK-12528 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 2.0.0 >Reporter: Youcef HILEM >Priority: Minor > > Spark has a hidden REST API which handles application submission, status > checking and cancellation (https://issues.apache.org/jira/browse/SPARK-5388). > There is enough interest using this API to justify making it public : > - https://github.com/ywilkof/spark-jobs-rest-client > - https://github.com/yohanliyanage/jenkins-spark-deploy > - https://github.com/spark-jobserver/spark-jobserver > - http://stackoverflow.com/questions/28992802/triggering-spark-jobs-with-rest > - http://stackoverflow.com/questions/34225879/how-to-submit-a-job-via-rest-api > - http://arturmkrtchyan.com/apache-spark-hidden-rest-api -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12528) Make Apache Spark’s gateway hidden REST API (in standalone cluster mode) public API
[ https://issues.apache.org/jira/browse/SPARK-12528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175619#comment-15175619 ] Ahmed Kamal commented on SPARK-12528: - As mentioned in this issue design document , REST API is supporting MESOS too (https://issues.apache.org/jira/browse/SPARK-5338). Why don't Spark also make the API support YARN (YARN already has a REST API for jobs submission and monitoring so I can imagine this shouldn't be difficult) so it become a standard way to submit jobs in a language independent & cluster independent way. > Make Apache Spark’s gateway hidden REST API (in standalone cluster mode) > public API > --- > > Key: SPARK-12528 > URL: https://issues.apache.org/jira/browse/SPARK-12528 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 2.0.0 >Reporter: Youcef HILEM >Priority: Minor > > Spark has a hidden REST API which handles application submission, status > checking and cancellation (https://issues.apache.org/jira/browse/SPARK-5388). > There is enough interest using this API to justify making it public : > - https://github.com/ywilkof/spark-jobs-rest-client > - https://github.com/yohanliyanage/jenkins-spark-deploy > - https://github.com/spark-jobserver/spark-jobserver > - http://stackoverflow.com/questions/28992802/triggering-spark-jobs-with-rest > - http://stackoverflow.com/questions/34225879/how-to-submit-a-job-via-rest-api > - http://arturmkrtchyan.com/apache-spark-hidden-rest-api -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org