[jira] [Commented] (SPARK-40502) Support dataframe API use jdbc data source in PySpark

2022-09-21 Thread CaoYu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607524#comment-17607524
 ] 

CaoYu commented on SPARK-40502:
---

When I designed the Python Flink course
It is found that PyFlink does not have the operators sum\min\minby\max\maxby

So I submitted a PR to the flink community and provided the python 
implementation code of these operators (FLINK-26609 FLINK-26728)

So, again, if jdbc datasource is what pyspark needs, I'd love and have the time 
to implement it

> Support dataframe API use jdbc data source in PySpark
> -
>
> Key: SPARK-40502
> URL: https://issues.apache.org/jira/browse/SPARK-40502
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: CaoYu
>Priority: Major
>
> When i using pyspark, i wanna get data from mysql database.  so i want use 
> JDBCRDD like java\scala.
> But that is not be supported in PySpark.
>  
> For some reasons, i can't using DataFrame API, only can use RDD(datastream) 
> API. Even i know the DataFrame can get data from jdbc source fairly well.
>  
> So i want to implement functionality that can use rdd to get data from jdbc 
> source for PySpark.
>  
> *But i don't know if that are necessary for PySpark.   so we can discuss it.*
>  
> {*}If it is necessary for PySpark{*}{*}, i want to contribute to Spark.{*}  
> *i hope this Jira task can assigned to me, so i can start working to 
> implement it.*
>  
> *if not, please close this Jira task.*
>  
>  
> *thanks a lot.*
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40502) Support dataframe API use jdbc data source in PySpark

2022-09-20 Thread CaoYu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607523#comment-17607523
 ] 

CaoYu commented on SPARK-40502:
---

I am a teacher
Recently designed Python language basic course, big data direction

PySpark is one of the practical cases, but it is only a simple use of RDD code 
to complete the basic data processing work, and the use of JDBC data source is 
a part of the course

DataFrames(SparkSQL) will be used in future design advanced courses.
So I hope the datastream API to have the capability of jdbc datasource.

 

 

> Support dataframe API use jdbc data source in PySpark
> -
>
> Key: SPARK-40502
> URL: https://issues.apache.org/jira/browse/SPARK-40502
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: CaoYu
>Priority: Major
>
> When i using pyspark, i wanna get data from mysql database.  so i want use 
> JDBCRDD like java\scala.
> But that is not be supported in PySpark.
>  
> For some reasons, i can't using DataFrame API, only can use RDD(datastream) 
> API. Even i know the DataFrame can get data from jdbc source fairly well.
>  
> So i want to implement functionality that can use rdd to get data from jdbc 
> source for PySpark.
>  
> *But i don't know if that are necessary for PySpark.   so we can discuss it.*
>  
> {*}If it is necessary for PySpark{*}{*}, i want to contribute to Spark.{*}  
> *i hope this Jira task can assigned to me, so i can start working to 
> implement it.*
>  
> *if not, please close this Jira task.*
>  
>  
> *thanks a lot.*
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40502) Support dataframe API use jdbc data source in PySpark

2022-09-20 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607447#comment-17607447
 ] 

Hyukjin Kwon commented on SPARK-40502:
--

{quote}
For some reasons, i can't using DataFrame API, only can use RDD(datastream) API.
{quote}
What's the reason?

> Support dataframe API use jdbc data source in PySpark
> -
>
> Key: SPARK-40502
> URL: https://issues.apache.org/jira/browse/SPARK-40502
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: CaoYu
>Priority: Major
>
> When i using pyspark, i wanna get data from mysql database.  so i want use 
> JDBCRDD like java\scala.
> But that is not be supported in PySpark.
>  
> For some reasons, i can't using DataFrame API, only can use RDD(datastream) 
> API. Even i know the DataFrame can get data from jdbc source fairly well.
>  
> So i want to implement functionality that can use rdd to get data from jdbc 
> source for PySpark.
>  
> *But i don't know if that are necessary for PySpark.   so we can discuss it.*
>  
> {*}If it is necessary for PySpark{*}{*}, i want to contribute to Spark.{*}  
> *i hope this Jira task can assigned to me, so i can start working to 
> implement it.*
>  
> *if not, please close this Jira task.*
>  
>  
> *thanks a lot.*
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org