[jira] [Commented] (SPARK-48286) Analyze 'exists' default expression instead of 'current' default expression in structField to v2 column conversion

2024-06-06 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852984#comment-17852984
 ] 

melin commented on SPARK-48286:
---

defaultValueNotConstantError method not exists

> Analyze 'exists' default expression instead of 'current' default expression 
> in structField to v2 column conversion
> --
>
> Key: SPARK-48286
> URL: https://issues.apache.org/jira/browse/SPARK-48286
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Uros Stankovic
>Assignee: Uros Stankovic
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze method 
> accepts 3 parameter
> 1) Field to analyze
> 2) Statement type - String
> 3) Metadata key - CURRENT_DEFAULT or EXISTS_DEFAULT
> Method 
> org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column
> pass fieldToAnalyze and EXISTS_DEFAULT as second parameter, so it is not 
> metadata key, instead of that, it is statement type, so bad expression is 
> analyzed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48450) Support Jdbc datasource custom data partitioning

2024-05-28 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-48450:
--
Description: 
"partitionColumn, lowerBound, upperBound" is not an efficient table 
partitioning scheme for some databases, The amount of data in each partition is 
consistent. Such as: Oracle has more efficient data table partitioning,

[https://github.com/apache/sqoop/blob/f8beae32a067d72bf9ed6e903b041ad347ca5491/src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java#L335C51-L335C76]

  was:
"partitionColumn, lowerBound, upperBound" is not an efficient table 
partitioning scheme for some databases, such as: Oracle has more efficient data 
table partitioning,The amount of data in each partition is consistent

[https://github.com/apache/sqoop/blob/f8beae32a067d72bf9ed6e903b041ad347ca5491/src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java#L335C51-L335C76]


> Support Jdbc datasource custom data partitioning
> 
>
> Key: SPARK-48450
> URL: https://issues.apache.org/jira/browse/SPARK-48450
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: melin
>Priority: Major
>
> "partitionColumn, lowerBound, upperBound" is not an efficient table 
> partitioning scheme for some databases, The amount of data in each partition 
> is consistent. Such as: Oracle has more efficient data table partitioning,
> [https://github.com/apache/sqoop/blob/f8beae32a067d72bf9ed6e903b041ad347ca5491/src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java#L335C51-L335C76]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48450) Support Jdbc datasource custom data partitioning

2024-05-28 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-48450:
--
Description: 
"partitionColumn, lowerBound, upperBound" is not an efficient table 
partitioning scheme for some databases, such as: Oracle has more efficient data 
table partitioning,The amount of data in each partition is consistent

[https://github.com/apache/sqoop/blob/f8beae32a067d72bf9ed6e903b041ad347ca5491/src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java#L335C51-L335C76]

  was:
"partitionColumn, lowerBound, upperBound" is not an efficient table 
partitioning scheme for some databases, such as: Oracle has more efficient data 
table partitioning

https://github.com/apache/sqoop/blob/f8beae32a067d72bf9ed6e903b041ad347ca5491/src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java#L335C51-L335C76


> Support Jdbc datasource custom data partitioning
> 
>
> Key: SPARK-48450
> URL: https://issues.apache.org/jira/browse/SPARK-48450
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: melin
>Priority: Major
>
> "partitionColumn, lowerBound, upperBound" is not an efficient table 
> partitioning scheme for some databases, such as: Oracle has more efficient 
> data table partitioning,The amount of data in each partition is consistent
> [https://github.com/apache/sqoop/blob/f8beae32a067d72bf9ed6e903b041ad347ca5491/src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java#L335C51-L335C76]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48404) Driver and Executor support merge and run in a single jvm

2024-05-23 Thread melin (Jira)
melin created SPARK-48404:
-

 Summary: Driver and Executor support merge and run in a single jvm
 Key: SPARK-48404
 URL: https://issues.apache.org/jira/browse/SPARK-48404
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: melin


Spark is used in data integration scenarios (such as reading data from mysql 
and writing data to other data sources), and in many cases can run tasks in a 
single concurrency. The Driver and Executor consume resources separately. If 
Driver and Executor support merging, especially when running on the cloud. Can 
save calculation cost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47389) spark jdbc one insert with multiple values

2024-03-14 Thread melin (Jira)
melin created SPARK-47389:
-

 Summary: spark jdbc one insert with multiple values
 Key: SPARK-47389
 URL: https://issues.apache.org/jira/browse/SPARK-47389
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
Reporter: melin


Many databases support a single insert sql to write multiple rows of data. 
Write performance is more efficient than batch execution of multiple sql files

 

https://github.com/apache/spark/blob/9986462811f160eacd766da8a4e14a9cbb4b8710/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L725

 

example:

 
{code:java}
INSERT INTO Customers (Name, Age, Active) ('Name1',21,1) INSERT INTO Customers 
(Name, Age, Active) ('Name2',21,1)
Vs
INSERT INTO Customers (Name, Age, Active) ('Name1',21,1), ('Name2',21,1)
{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47198) Is it possible to dynamically add backend service to ingress with Kubernetes?

2024-02-29 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-47198:
--
Description: 
spark on k8s runs multiple spark apps at the same time. proxy/[sparkappid] 
path, forwarding to different sparkapp ui console based on sparkappid. spark 
apps are dynamically added and decreased. ingress Dynamically adds spark svc.

[sparkappid]_svc == spark svc name

[https://matthewpalmer.net/kubernetes-app-developer/articles/kubernetes-ingress-guide-nginx-example.html]

[~Qin Yao] 

  was:
spark on k8s runs multiple spark apps at the same time. proxy/[sparkappid] 
path, forwarding to different sparkapp ui console based on sparkappid. spark 
apps are dynamically added and decreased. ingress Dynamically adds spark svc.

[sparkappid]_svc == spark svc name

[https://matthewpalmer.net/kubernetes-app-developer/articles/kubernetes-ingress-guide-nginx-example.html]


> Is it possible to dynamically add backend service to ingress with Kubernetes?
> -
>
> Key: SPARK-47198
> URL: https://issues.apache.org/jira/browse/SPARK-47198
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: melin
>Priority: Major
>
> spark on k8s runs multiple spark apps at the same time. proxy/[sparkappid] 
> path, forwarding to different sparkapp ui console based on sparkappid. spark 
> apps are dynamically added and decreased. ingress Dynamically adds spark svc.
> [sparkappid]_svc == spark svc name
> [https://matthewpalmer.net/kubernetes-app-developer/articles/kubernetes-ingress-guide-nginx-example.html]
> [~Qin Yao] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47198) Is it possible to dynamically add backend service to ingress with Kubernetes?

2024-02-27 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-47198:
--
Description: 
spark on k8s runs multiple spark apps at the same time. proxy/[sparkappid] 
path, forwarding to different sparkapp ui console based on sparkappid. spark 
apps are dynamically added and decreased. ingress Dynamically adds spark svc.

[sparkappid]_svc == spark svc name

[https://matthewpalmer.net/kubernetes-app-developer/articles/kubernetes-ingress-guide-nginx-example.html]

  was:
spark on k8s runs multiple spark apps at the same time. proxy/[sparkappid] 
path, forwarding to different sparkapp ui console based on sparkappid. spark 
apps are dynamically added and decreased. ingress Dynamically adds spark svc.

sparkappid == spark svc name

[https://matthewpalmer.net/kubernetes-app-developer/articles/kubernetes-ingress-guide-nginx-example.html]


> Is it possible to dynamically add backend service to ingress with Kubernetes?
> -
>
> Key: SPARK-47198
> URL: https://issues.apache.org/jira/browse/SPARK-47198
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: melin
>Priority: Major
>
> spark on k8s runs multiple spark apps at the same time. proxy/[sparkappid] 
> path, forwarding to different sparkapp ui console based on sparkappid. spark 
> apps are dynamically added and decreased. ingress Dynamically adds spark svc.
> [sparkappid]_svc == spark svc name
> [https://matthewpalmer.net/kubernetes-app-developer/articles/kubernetes-ingress-guide-nginx-example.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47198) Is it possible to dynamically add backend service to ingress with Kubernetes?

2024-02-27 Thread melin (Jira)
melin created SPARK-47198:
-

 Summary: Is it possible to dynamically add backend service to 
ingress with Kubernetes?
 Key: SPARK-47198
 URL: https://issues.apache.org/jira/browse/SPARK-47198
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes
Affects Versions: 4.0.0
Reporter: melin


spark on k8s runs multiple spark apps at the same time. proxy/[sparkappid] 
path, forwarding to different sparkapp ui console based on sparkappid. spark 
apps are dynamically added and decreased. ingress Dynamically adds spark svc.

sparkappid == spark svc name

[https://matthewpalmer.net/kubernetes-app-developer/articles/kubernetes-ingress-guide-nginx-example.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file

2024-02-27 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin resolved SPARK-47114.
---
Resolution: Resolved

> In the spark driver pod. Failed to access the krb5 file
> ---
>
> Key: SPARK-47114
> URL: https://issues.apache.org/jira/browse/SPARK-47114
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.1
>Reporter: melin
>Priority: Major
>
> spark runs in kubernetes and accesses an external hdfs cluster (kerberos),pod 
> error logs
> {code:java}
> Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf 
> loading failed{code}
> This error generally occurs when the krb5 file cannot be found
> [~yao] [~Qin Yao] 
> {code:java}
> ./bin/spark-submit \
>     --master k8s://https://172.18.5.44:6443 \
>     --deploy-mode cluster \
>     --name spark-pi \
>     --class org.apache.spark.examples.SparkPi \
>     --conf spark.executor.instances=1 \
>     --conf spark.kubernetes.submission.waitAppCompletion=true \
>     --conf spark.kubernetes.driver.pod.name=spark-xxx \
>     --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \
>     --conf spark.kubernetes.driver.label.profile=production \
>     --conf spark.kubernetes.executor.label.profile=production \
>     --conf spark.kubernetes.namespace=superior \
>     --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
>     --conf 
> spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0
>  \
>     --conf 
> spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \
>     --conf spark.kubernetes.container.image.pullPolicy=Always \
>     --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \
>     --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  \
>     --conf spark.kerberos.principal=superior/ad...@datacyber.com  \
>     --conf spark.kerberos.keytab=/root/superior.keytab  \
>     
> file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar
>   5{code}
> {code:java}
> (base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior
> Exception in thread "main" java.lang.IllegalArgumentException: Can't get 
> Kerberos realm
>         at 
> org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71)
>         at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
>         at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
>         at 
> org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395)
>         at 
> org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389)
>         at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119)
>         at 
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf 
> loading failed
>         at 
> java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown
>  Source)
>         at 
> org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120)
>         at 
> org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69)
>         ... 13 more
> (base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior
> Name:             spark-xxx
> Namespace:        superior
> Priority:         0
> Service Account:  spark
> Node:             cdh2/172.18.5.45
> Start Time:       Wed, 21 Feb 2024 15:48:08 +0800
> Labels:           profile=production
>                   spark-app-name=spark-pi
>                   spark-app-selector=spark-728e24e49f9040fa86b04c521463020b
>                   spark-role=driver
>                   spark-version=3.4.2
> Annotations:      
> Status:           Failed
> IP:               10.244.1.4
> IPs:
>   IP:  10.244.1.4
> Containers:
>   spark-kubernetes-driver:
>     Container ID:  
> 

[jira] [Commented] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file

2024-02-27 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821469#comment-17821469
 ] 

melin commented on SPARK-47114:
---

默认jre17 不支持kerberos,换成jdk 

> In the spark driver pod. Failed to access the krb5 file
> ---
>
> Key: SPARK-47114
> URL: https://issues.apache.org/jira/browse/SPARK-47114
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.1
>Reporter: melin
>Priority: Major
>
> spark runs in kubernetes and accesses an external hdfs cluster (kerberos),pod 
> error logs
> {code:java}
> Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf 
> loading failed{code}
> This error generally occurs when the krb5 file cannot be found
> [~yao] [~Qin Yao] 
> {code:java}
> ./bin/spark-submit \
>     --master k8s://https://172.18.5.44:6443 \
>     --deploy-mode cluster \
>     --name spark-pi \
>     --class org.apache.spark.examples.SparkPi \
>     --conf spark.executor.instances=1 \
>     --conf spark.kubernetes.submission.waitAppCompletion=true \
>     --conf spark.kubernetes.driver.pod.name=spark-xxx \
>     --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \
>     --conf spark.kubernetes.driver.label.profile=production \
>     --conf spark.kubernetes.executor.label.profile=production \
>     --conf spark.kubernetes.namespace=superior \
>     --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
>     --conf 
> spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0
>  \
>     --conf 
> spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \
>     --conf spark.kubernetes.container.image.pullPolicy=Always \
>     --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \
>     --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  \
>     --conf spark.kerberos.principal=superior/ad...@datacyber.com  \
>     --conf spark.kerberos.keytab=/root/superior.keytab  \
>     
> file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar
>   5{code}
> {code:java}
> (base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior
> Exception in thread "main" java.lang.IllegalArgumentException: Can't get 
> Kerberos realm
>         at 
> org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71)
>         at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
>         at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
>         at 
> org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395)
>         at 
> org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389)
>         at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119)
>         at 
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf 
> loading failed
>         at 
> java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown
>  Source)
>         at 
> org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120)
>         at 
> org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69)
>         ... 13 more
> (base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior
> Name:             spark-xxx
> Namespace:        superior
> Priority:         0
> Service Account:  spark
> Node:             cdh2/172.18.5.45
> Start Time:       Wed, 21 Feb 2024 15:48:08 +0800
> Labels:           profile=production
>                   spark-app-name=spark-pi
>                   spark-app-selector=spark-728e24e49f9040fa86b04c521463020b
>                   spark-role=driver
>                   spark-version=3.4.2
> Annotations:      
> Status:           Failed
> IP:               10.244.1.4
> IPs:
>   IP:  10.244.1.4
> Containers:
>   spark-kubernetes-driver:
>     Container ID:  
> 

[jira] [Updated] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file

2024-02-21 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-47114:
--
Description: 
spark runs in kubernetes and accesses an external hdfs cluster (kerberos),pod 
error logs
{code:java}
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading 
failed{code}
This error generally occurs when the krb5 file cannot be found

[~yao] [~Qin Yao] 
{code:java}
./bin/spark-submit \
    --master k8s://https://172.18.5.44:6443 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=1 \
    --conf spark.kubernetes.submission.waitAppCompletion=true \
    --conf spark.kubernetes.driver.pod.name=spark-xxx \
    --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \
    --conf spark.kubernetes.driver.label.profile=production \
    --conf spark.kubernetes.executor.label.profile=production \
    --conf spark.kubernetes.namespace=superior \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf 
spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0
 \
    --conf 
spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \
    --conf spark.kubernetes.container.image.pullPolicy=Always \
    --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \
    --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  \
    --conf spark.kerberos.principal=superior/ad...@datacyber.com  \
    --conf spark.kerberos.keytab=/root/superior.keytab  \
    
file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar
  5{code}
{code:java}
(base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior
Exception in thread "main" java.lang.IllegalArgumentException: Can't get 
Kerberos realm
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71)
        at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
        at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
        at 
org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395)
        at 
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389)
        at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119)
        at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading 
failed
        at 
java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown
 Source)
        at 
org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120)
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69)
        ... 13 more
(base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior
Name:             spark-xxx
Namespace:        superior
Priority:         0
Service Account:  spark
Node:             cdh2/172.18.5.45
Start Time:       Wed, 21 Feb 2024 15:48:08 +0800
Labels:           profile=production
                  spark-app-name=spark-pi
                  spark-app-selector=spark-728e24e49f9040fa86b04c521463020b
                  spark-role=driver
                  spark-version=3.4.2
Annotations:      
Status:           Failed
IP:               10.244.1.4
IPs:
  IP:  10.244.1.4
Containers:
  spark-kubernetes-driver:
    Container ID:  
containerd://cceaf13b70cc5f21a639e71cb8663989ec73e122380844624d4bfac3946bae15
    Image:         spark:3.4.1
    Image ID:      
docker.io/library/spark@sha256:69fb485a0bcad88f9a2bf066e1b5d555f818126dc9df5a0b7e6a3b6d364bc694
    Ports:         7078/TCP, 7079/TCP, 4040/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      driver
      --properties-file
      /opt/spark/conf/spark.properties
      --class
      org.apache.spark.examples.SparkPi
      spark-internal
      5
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 21 Feb 2024 15:49:54 +0800
      Finished:     Wed, 21 Feb 2024 15:49:56 

[jira] [Updated] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file

2024-02-20 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-47114:
--
Description: 
spark runs in kubernetes and accesses an external hdfs cluster (kerberos),pod 
error logs
{code:java}
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading 
failed{code}
This error generally occurs when the krb5 file cannot be found

[~yao] [~Qin Yao] 
{code:java}
./bin/spark-submit \
    --master k8s://https://172.18.5.44:6443 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=1 \
    --conf spark.kubernetes.submission.waitAppCompletion=true \
    --conf spark.kubernetes.driver.pod.name=spark-xxx \
    --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \
    --conf spark.kubernetes.driver.label.profile=production \
    --conf spark.kubernetes.executor.label.profile=production \
    --conf spark.kubernetes.namespace=superior \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf 
spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0
 \
    --conf 
spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \
    --conf spark.kubernetes.container.image.pullPolicy=Always \
    --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \
    --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  \
    --conf spark.kerberos.principal=superior/ad...@datacyber.com  \
    --conf spark.kerberos.keytab=/root/superior.keytab  \
    
file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar
  5{code}
{code:java}
(base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior
Exception in thread "main" java.lang.IllegalArgumentException: Can't get 
Kerberos realm
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71)
        at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
        at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
        at 
org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395)
        at 
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389)
        at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119)
        at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading 
failed
        at 
java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown
 Source)
        at 
org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120)
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69)
        ... 13 more
(base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior
Name:             spark-xxx
Namespace:        superior
Priority:         0
Service Account:  spark
Node:             cdh2/172.18.5.45
Start Time:       Wed, 21 Feb 2024 15:48:08 +0800
Labels:           profile=production
                  spark-app-name=spark-pi
                  spark-app-selector=spark-728e24e49f9040fa86b04c521463020b
                  spark-role=driver
                  spark-version=3.4.2
Annotations:      
Status:           Failed
IP:               10.244.1.4
IPs:
  IP:  10.244.1.4
Containers:
  spark-kubernetes-driver:
    Container ID:  
containerd://cceaf13b70cc5f21a639e71cb8663989ec73e122380844624d4bfac3946bae15
    Image:         spark:3.4.1
    Image ID:      
docker.io/library/spark@sha256:69fb485a0bcad88f9a2bf066e1b5d555f818126dc9df5a0b7e6a3b6d364bc694
    Ports:         7078/TCP, 7079/TCP, 4040/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      driver
      --properties-file
      /opt/spark/conf/spark.properties
      --class
      org.apache.spark.examples.SparkPi
      spark-internal
      5
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 21 Feb 2024 15:49:54 +0800
      Finished:     Wed, 21 Feb 2024 15:49:56 

[jira] [Updated] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file

2024-02-20 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-47114:
--
Description: 
spark runs in kubernetes and accesses an external hdfs cluster (kerberos),pod 
error logs
{code:java}
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading 
failed{code}
This error generally occurs when the krb5 file cannot be found

[~yao] [~Qin Yao] 
{code:java}
./bin/spark-submit \
    --master k8s://https://172.18.5.44:6443 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=1 \
    --conf spark.kubernetes.submission.waitAppCompletion=true \
    --conf spark.kubernetes.driver.pod.name=spark-xxx \
    --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \
    --conf spark.kubernetes.driver.label.profile=production \
    --conf spark.kubernetes.executor.label.profile=production \
    --conf spark.kubernetes.namespace=superior \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf 
spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0
 \
    --conf 
spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \
    --conf spark.kubernetes.container.image.pullPolicy=Always \
    --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \
    --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  \
    --conf spark.kerberos.principal=superior/ad...@datacyber.com  \
    --conf spark.kerberos.keytab=/root/superior.keytab  \
    --conf 
spark.kubernetes.driver.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/driver.yaml
 \
    --conf 
spark.kubernetes.executor.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/executor.yaml
 \
    
file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar
  5{code}
{code:java}
(base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior
Exception in thread "main" java.lang.IllegalArgumentException: Can't get 
Kerberos realm
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71)
        at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
        at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
        at 
org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395)
        at 
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389)
        at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119)
        at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading 
failed
        at 
java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown
 Source)
        at 
org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120)
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69)
        ... 13 more
(base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior
Name:             spark-xxx
Namespace:        superior
Priority:         0
Service Account:  spark
Node:             cdh2/172.18.5.45
Start Time:       Wed, 21 Feb 2024 15:48:08 +0800
Labels:           profile=production
                  spark-app-name=spark-pi
                  spark-app-selector=spark-728e24e49f9040fa86b04c521463020b
                  spark-role=driver
                  spark-version=3.4.2
Annotations:      
Status:           Failed
IP:               10.244.1.4
IPs:
  IP:  10.244.1.4
Containers:
  spark-kubernetes-driver:
    Container ID:  
containerd://cceaf13b70cc5f21a639e71cb8663989ec73e122380844624d4bfac3946bae15
    Image:         spark:3.4.1
    Image ID:      
docker.io/library/spark@sha256:69fb485a0bcad88f9a2bf066e1b5d555f818126dc9df5a0b7e6a3b6d364bc694
    Ports:         7078/TCP, 7079/TCP, 4040/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      driver
      --properties-file
      /opt/spark/conf/spark.properties
      --class
      

[jira] [Updated] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file

2024-02-20 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-47114:
--
Description: 
spark runs in kubernetes and accesses an external hdfs cluster (kerberos),pod 
error logs
{code:java}
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading 
failed{code}
This error generally occurs when the krb5 file cannot be found

[~yao] [~Qin Yao] 
{code:java}
./bin/spark-submit \
    --master k8s://https://172.18.5.44:6443 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=1 \
    --conf spark.kubernetes.submission.waitAppCompletion=true \
    --conf spark.kubernetes.driver.pod.name=spark-xxx \
    --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \
    --conf spark.kubernetes.driver.label.profile=production \
    --conf spark.kubernetes.executor.label.profile=production \
    --conf spark.kubernetes.namespace=superior \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf 
spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0
 \
    --conf 
spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \
    --conf spark.kubernetes.container.image.pullPolicy=Always \
    --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \
    --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  \
    --conf spark.kerberos.principal=superior/ad...@datacyber.com  \
    --conf spark.kerberos.keytab=/root/superior.keytab  \
    --conf 
spark.kubernetes.driver.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/driver.yaml
 \
    --conf 
spark.kubernetes.executor.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/executor.yaml
 \
    
file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar
  5{code}
{code:java}
(base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/bash ']'
+ '[' -z /opt/java/openjdk ']'
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
++ command -v readarray
+ '[' readarray ']'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z ']'
+ '[' -n '' ']'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf 
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=10.244.2.56 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi 
spark-internal 5
Exception in thread "main" java.lang.IllegalArgumentException: Can't get 
Kerberos realm
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71)
        at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
        at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
        at 
org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395)
        at 
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389)
        at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119)
        at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading 
failed
        at 
java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown
 Source)
        at 
org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120)
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69)
        ... 13 more
(base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior
Name:             

[jira] [Created] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file

2024-02-20 Thread melin (Jira)
melin created SPARK-47114:
-

 Summary: In the spark driver pod. Failed to access the krb5 file
 Key: SPARK-47114
 URL: https://issues.apache.org/jira/browse/SPARK-47114
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes
Affects Versions: 3.4.1
Reporter: melin


spark runs in kubernetes and accesses an external hdfs cluster (kerberos)

 
{code:java}
./bin/spark-submit \
    --master k8s://https://172.18.5.44:6443 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=1 \
    --conf spark.kubernetes.submission.waitAppCompletion=true \
    --conf spark.kubernetes.driver.pod.name=spark-xxx \
    --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \
    --conf spark.kubernetes.driver.label.profile=production \
    --conf spark.kubernetes.executor.label.profile=production \
    --conf spark.kubernetes.namespace=superior \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf 
spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0
 \
    --conf 
spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \
    --conf spark.kubernetes.container.image.pullPolicy=Always \
    --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \
    --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  \
    --conf spark.kerberos.principal=superior/ad...@datacyber.com  \
    --conf spark.kerberos.keytab=/root/superior.keytab  \
    --conf 
spark.kubernetes.driver.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/driver.yaml
 \
    --conf 
spark.kubernetes.executor.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/executor.yaml
 \
    
file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar
  5{code}
{code:java}
(base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/bash ']'
+ '[' -z /opt/java/openjdk ']'
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
++ command -v readarray
+ '[' readarray ']'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z ']'
+ '[' -n '' ']'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf 
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=10.244.2.56 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi 
spark-internal 5
Exception in thread "main" java.lang.IllegalArgumentException: Can't get 
Kerberos realm
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71)
        at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
        at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
        at 
org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395)
        at 
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389)
        at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119)
        at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading 
failed
        at 
java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown
 Source)
        at 
org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120)
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69)
        ... 13 more
(base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior
Name:             spark-xxx

[jira] [Created] (SPARK-46572) [SQL][Enhancement] hint enhancement

2024-01-02 Thread melin (Jira)
melin created SPARK-46572:
-

 Summary: [SQL][Enhancement] hint enhancement
 Key: SPARK-46572
 URL: https://issues.apache.org/jira/browse/SPARK-46572
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
Reporter: melin


https://github.com/StarRocks/starrocks/pull/37356



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-12-26 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43338:
--
Description: 
{code:java}
private[sql] object CatalogManager {
val SESSION_CATALOG_NAME: String = "spark_catalog"
}{code}
 
The SESSION_CATALOG_NAME value cannot be modified。

If the platform supports hive and spark sql, the metadata catalog name is 
hive_metastore. It's more appropriate. The user directly copies the table name 
and brings the hive_metastore catalog. In this case, the default spark catalog 
name needs to be changed。

 

!image-2023-12-27-09-55-55-693.png!

[~fanjia] 
 

  was:
{code:java}
private[sql] object CatalogManager {
val SESSION_CATALOG_NAME: String = "spark_catalog"
}{code}
 
The SESSION_CATALOG_NAME value cannot be modified。

If multiple Hive Metastores exist, the platform manages multiple hms metadata 
and classifies them by catalogName. A different catalog name is required

[~fanjia] 


> Support  modify the SESSION_CATALOG_NAME value
> --
>
> Key: SPARK-43338
> URL: https://issues.apache.org/jira/browse/SPARK-43338
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
> Attachments: image-2023-12-27-09-55-55-693.png
>
>
> {code:java}
> private[sql] object CatalogManager {
> val SESSION_CATALOG_NAME: String = "spark_catalog"
> }{code}
>  
> The SESSION_CATALOG_NAME value cannot be modified。
> If the platform supports hive and spark sql, the metadata catalog name is 
> hive_metastore. It's more appropriate. The user directly copies the table 
> name and brings the hive_metastore catalog. In this case, the default spark 
> catalog name needs to be changed。
>  
> !image-2023-12-27-09-55-55-693.png!
> [~fanjia] 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-12-26 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43338:
--
Attachment: image-2023-12-27-09-55-55-693.png

> Support  modify the SESSION_CATALOG_NAME value
> --
>
> Key: SPARK-43338
> URL: https://issues.apache.org/jira/browse/SPARK-43338
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
> Attachments: image-2023-12-27-09-55-55-693.png
>
>
> {code:java}
> private[sql] object CatalogManager {
> val SESSION_CATALOG_NAME: String = "spark_catalog"
> }{code}
>  
> The SESSION_CATALOG_NAME value cannot be modified。
> If multiple Hive Metastores exist, the platform manages multiple hms metadata 
> and classifies them by catalogName. A different catalog name is required
> [~fanjia] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46518) Support for copy from write compatible postgresql databases (pg, redshift, snowflake, gauss)

2023-12-26 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-46518:
--
Description: 
Now many databases are compatible with pg syntax and support copy from syntax. 
The copy form import performance is 10 times higher than that of jdbc batch.

[https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/CopyHelper.scala]

Supports upsert data import: 
[https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/DataTunnelJdbcRelationProvider.scala]

!image-2023-12-27-09-44-19-292.png!

 

[~yao] 

  was:
Now many databases are compatible with pg syntax and support copy from syntax. 
The copy form import performance is 10 times higher than that of jdbc batch.

[https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/CopyHelper.scala]

Supports upsert data import: 
[https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/DataTunnelJdbcRelationProvider.scala]

!image-2023-12-27-09-43-01-529.png!

 

 


> Support for copy from write compatible postgresql databases (pg, redshift, 
> snowflake, gauss)
> 
>
> Key: SPARK-46518
> URL: https://issues.apache.org/jira/browse/SPARK-46518
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: melin
>Priority: Major
> Attachments: image-2023-12-27-09-44-19-292.png
>
>
> Now many databases are compatible with pg syntax and support copy from 
> syntax. The copy form import performance is 10 times higher than that of jdbc 
> batch.
> [https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/CopyHelper.scala]
> Supports upsert data import: 
> [https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/DataTunnelJdbcRelationProvider.scala]
> !image-2023-12-27-09-44-19-292.png!
>  
> [~yao] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46518) Support for copy from write compatible postgresql databases (pg, redshift, snowflake, gauss)

2023-12-26 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-46518:
--
Attachment: image-2023-12-27-09-44-19-292.png

> Support for copy from write compatible postgresql databases (pg, redshift, 
> snowflake, gauss)
> 
>
> Key: SPARK-46518
> URL: https://issues.apache.org/jira/browse/SPARK-46518
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: melin
>Priority: Major
> Attachments: image-2023-12-27-09-44-19-292.png
>
>
> Now many databases are compatible with pg syntax and support copy from 
> syntax. The copy form import performance is 10 times higher than that of jdbc 
> batch.
> [https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/CopyHelper.scala]
> Supports upsert data import: 
> [https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/DataTunnelJdbcRelationProvider.scala]
> !image-2023-12-27-09-43-01-529.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46518) Support for copy from write compatible postgresql databases (pg, redshift, snowflake, gauss)

2023-12-26 Thread melin (Jira)
melin created SPARK-46518:
-

 Summary: Support for copy from write compatible postgresql 
databases (pg, redshift, snowflake, gauss)
 Key: SPARK-46518
 URL: https://issues.apache.org/jira/browse/SPARK-46518
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
Reporter: melin


Now many databases are compatible with pg syntax and support copy from syntax. 
The copy form import performance is 10 times higher than that of jdbc batch.

[https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/CopyHelper.scala]

Supports upsert data import: 
[https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/DataTunnelJdbcRelationProvider.scala]

!image-2023-12-27-09-43-01-529.png!

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46511) Optimize spark jdbc write speed with Multi-Row Inserts

2023-12-26 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin resolved SPARK-46511.
---
Resolution: Fixed

> Optimize spark jdbc write speed with Multi-Row Inserts
> --
>
> Key: SPARK-46511
> URL: https://issues.apache.org/jira/browse/SPARK-46511
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: melin
>Priority: Major
>
> INSERT INTO table_name (column1, column2, column3)
> VALUES (value1, value2, value3),
> (value4, value5, value6),
> (value7, value8, value9);



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46511) Optimize spark jdbc write speed with Multi-Row Inserts

2023-12-25 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-46511:
--
Description: 
[https://brian.pontarelli.com/2011/06/21/jdbc-batch-vs-multi-row-inserts/]

mysql, pg、Oracle 23c, sqlserver support:

INSERT INTO table_name (column1, column2, column3)
VALUES (value1, value2, value3),
(value4, value5, value6),
(value7, value8, value9);
 

  was:
[https://brian.pontarelli.com/2011/06/21/jdbc-batch-vs-multi-row-inserts/]

mysql, pg、Oracle 23c, sqlserver support:

INSERT INTO table_name (column1, column2, column3)
VALUES (value1, value2, value3),
(value4, value5, value6),
(value7, value8, value9);


> Optimize spark jdbc write speed with Multi-Row Inserts
> --
>
> Key: SPARK-46511
> URL: https://issues.apache.org/jira/browse/SPARK-46511
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: melin
>Priority: Major
>
> [https://brian.pontarelli.com/2011/06/21/jdbc-batch-vs-multi-row-inserts/]
> mysql, pg、Oracle 23c, sqlserver support:
> INSERT INTO table_name (column1, column2, column3)
> VALUES (value1, value2, value3),
> (value4, value5, value6),
> (value7, value8, value9);
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46511) Optimize spark jdbc write speed with Multi-Row Inserts

2023-12-25 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-46511:
--
Description: 
[https://brian.pontarelli.com/2011/06/21/jdbc-batch-vs-multi-row-inserts/]

mysql, pg、Oracle 23c, sqlserver support:

INSERT INTO table_name (column1, column2, column3)
VALUES (value1, value2, value3),
(value4, value5, value6),
(value7, value8, value9);

  was:
[https://brian.pontarelli.com/2011/06/21/jdbc-batch-vs-multi-row-inserts/]

mysql, pg、Oracle 23c, sqlserver

INSERT INTO table_name (column1, column2, column3)
VALUES (value1, value2, value3),
(value4, value5, value6),
(value7, value8, value9);


> Optimize spark jdbc write speed with Multi-Row Inserts
> --
>
> Key: SPARK-46511
> URL: https://issues.apache.org/jira/browse/SPARK-46511
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: melin
>Priority: Major
>
> [https://brian.pontarelli.com/2011/06/21/jdbc-batch-vs-multi-row-inserts/]
> mysql, pg、Oracle 23c, sqlserver support:
> INSERT INTO table_name (column1, column2, column3)
> VALUES (value1, value2, value3),
> (value4, value5, value6),
> (value7, value8, value9);



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46511) Optimize spark jdbc write speed with Multi-Row Inserts

2023-12-25 Thread melin (Jira)
melin created SPARK-46511:
-

 Summary: Optimize spark jdbc write speed with Multi-Row Inserts
 Key: SPARK-46511
 URL: https://issues.apache.org/jira/browse/SPARK-46511
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
 Environment: 
[https://brian.pontarelli.com/2011/06/21/jdbc-batch-vs-multi-row-inserts/]

mysql, pg、Oracle 23c, sqlserver

INSERT INTO table_name (column1, column2, column3)
VALUES (value1, value2, value3),
(value4, value5, value6),
(value7, value8, value9);
Reporter: melin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-12-14 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796975#comment-17796975
 ] 

melin commented on SPARK-43338:
---

[~yao]   databricks support change:    spark.databricks.sql.initial.catalog.name

https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html

> Support  modify the SESSION_CATALOG_NAME value
> --
>
> Key: SPARK-43338
> URL: https://issues.apache.org/jira/browse/SPARK-43338
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> {code:java}
> private[sql] object CatalogManager {
> val SESSION_CATALOG_NAME: String = "spark_catalog"
> }{code}
>  
> The SESSION_CATALOG_NAME value cannot be modified。
> If multiple Hive Metastores exist, the platform manages multiple hms metadata 
> and classifies them by catalogName. A different catalog name is required
> [~fanjia] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46195) Supports parse multiple sql statements

2023-11-30 Thread melin (Jira)
melin created SPARK-46195:
-

 Summary: Supports parse multiple sql statements
 Key: SPARK-46195
 URL: https://issues.apache.org/jira/browse/SPARK-46195
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
Reporter: melin


 
In the SqlBaseParser.g4 file, add the following code to support the parsing of 
multiple sql. select * from (select * from test), which resolves into two 
statements. Need to add alias
{code:java}
sqlStatements
: singleStatement* EOF
;

singleStatement
: statement SEMICOLON?
; {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45818) Support columnar/vectorized evaluation engine JDK17/21

2023-11-07 Thread melin (Jira)
melin created SPARK-45818:
-

 Summary: Support columnar/vectorized evaluation engine JDK17/21
 Key: SPARK-45818
 URL: https://issues.apache.org/jira/browse/SPARK-45818
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: melin


Trino uses JDK columnar/vectorized evaluation engine to improve performance:

[https://github.com/trinodb/trino/pull/19302]

[https://github.com/trinodb/trino/issues/14237]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45140) Support ddl output json format

2023-09-12 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764488#comment-17764488
 ] 

melin commented on SPARK-45140:
---

We have a product that uses jdbc to collect metadata from various databases. To 
support hudi / paimon ddl, start spark thriftserver and collect Hive table 
metadata. Run the "show create table extended table_name" command to obtain 
table details. Parsing the text format is very inconvenient. Want to support 
json format
 

> Support ddl output json format
> --
>
> Key: SPARK-45140
> URL: https://issues.apache.org/jira/browse/SPARK-45140
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
> Environment:  
>  
>  
>  
>Reporter: melin
>Priority: Major
>
> hive supports ddl output json format. set hive.ddl.output.format=json;
> hive table metadata is collected and output in json format for easy parsing
> [~yao] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45140) Support ddl output json format

2023-09-12 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-45140:
--
Description: 
hive supports ddl output json format. set hive.ddl.output.format=json;

hive table metadata is collected and output in json format for easy parsing

[~yao] 

  was:
hive supports ddl output json format. set hive.ddl.output.format=json;

hive table metadata is collected and output in json format for easy parsing

 


> Support ddl output json format
> --
>
> Key: SPARK-45140
> URL: https://issues.apache.org/jira/browse/SPARK-45140
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
> Environment:  
>  
>  
>  
>Reporter: melin
>Priority: Major
>
> hive supports ddl output json format. set hive.ddl.output.format=json;
> hive table metadata is collected and output in json format for easy parsing
> [~yao] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45140) Support ddl output json format

2023-09-12 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-45140:
--
Description: 
hive supports ddl output json format. set hive.ddl.output.format=json;

hive table metadata is collected and output in json format for easy parsing

 

> Support ddl output json format
> --
>
> Key: SPARK-45140
> URL: https://issues.apache.org/jira/browse/SPARK-45140
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
> Environment:  
>  
>  
>  
>Reporter: melin
>Priority: Major
>
> hive supports ddl output json format. set hive.ddl.output.format=json;
> hive table metadata is collected and output in json format for easy parsing
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45140) Support ddl output json format

2023-09-12 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-45140:
--
Environment: 
 

 
 
 

  was:
hive supports ddl output json format. set hive.ddl.output.format=json;

hive table metadata is collected and output in json format for easy parsing

 

 
 


> Support ddl output json format
> --
>
> Key: SPARK-45140
> URL: https://issues.apache.org/jira/browse/SPARK-45140
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
> Environment:  
>  
>  
>  
>Reporter: melin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45140) Support ddl output json format

2023-09-12 Thread melin (Jira)
melin created SPARK-45140:
-

 Summary: Support ddl output json format
 Key: SPARK-45140
 URL: https://issues.apache.org/jira/browse/SPARK-45140
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
 Environment: hive supports ddl output json format. set 
hive.ddl.output.format=json;

hive table metadata is collected and output in json format for easy parsing

 

 
 
Reporter: melin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-07-17 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743701#comment-17743701
 ] 

melin commented on SPARK-43338:
---

[~yao]

Would consider setting a custom catalog name?  example:

spark.sql.session.catalog.default.name=spark_catalog

> Support  modify the SESSION_CATALOG_NAME value
> --
>
> Key: SPARK-43338
> URL: https://issues.apache.org/jira/browse/SPARK-43338
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> {code:java}
> private[sql] object CatalogManager {
> val SESSION_CATALOG_NAME: String = "spark_catalog"
> }{code}
>  
> The SESSION_CATALOG_NAME value cannot be modified。
> If multiple Hive Metastores exist, the platform manages multiple hms metadata 
> and classifies them by catalogName. A different catalog name is required
> [~fanjia] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44015) Support SIMILAR TO operator

2023-06-09 Thread melin (Jira)
melin created SPARK-44015:
-

 Summary: Support SIMILAR TO operator
 Key: SPARK-44015
 URL: https://issues.apache.org/jira/browse/SPARK-44015
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.5.0
Reporter: melin


https://www.w3resource.com/PostgreSQL/postgresql-similar-operator.php



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44014) Support BETWEEN SYMMETRIC operator

2023-06-09 Thread melin (Jira)
melin created SPARK-44014:
-

 Summary: Support BETWEEN SYMMETRIC operator
 Key: SPARK-44014
 URL: https://issues.apache.org/jira/browse/SPARK-44014
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.5.0
Reporter: melin


https://andreigridnev.com/blog/2016-03-20-between-symmetric-operator-in-postgresql/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43776) [BUG] MySQL jdbc cursor has not taken effect

2023-05-29 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727226#comment-17727226
 ] 

melin commented on SPARK-43776:
---

mysql jdbc stream read data, need to set stmt.setFetchSize(Integer.MIN_VALUE)}, 
the default read data is bufffer mode, easy to cause oom。

JDBCRDD reads data in buffer mode,JdbcRDD is set by default to read data in 
streaming mode. These two RDDS are inconsistent

[~srowen]   

> [BUG] MySQL jdbc cursor has not taken effect
> 
>
> Key: SPARK-43776
> URL: https://issues.apache.org/jira/browse/SPARK-43776
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: melin
>Priority: Major
>
> JDBCRDD.scala:  stmt.setFetchSize(options.fetchSize)
>  
> JdbcRDD.scala:
> {code:java}
> if (url.startsWith("jdbc:mysql:")){ 
> // setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force
> // streaming results, rather than pulling entire resultset into memory.
> // See the below URL
> // 
> dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
>   stmt.setFetchSize(Integer.MIN_VALUE) }
> else
> { stmt.setFetchSize(100) }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43805) Support SELECT * EXCEPT AND SELECT * REPLACE

2023-05-25 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43805:
--
Description: 
ref: 
[https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]

https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_replace

[~fanjia] 

  was:
ref: 
[https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]

[~fanjia] 


> Support SELECT * EXCEPT AND  SELECT * REPLACE
> -
>
> Key: SPARK-43805
> URL: https://issues.apache.org/jira/browse/SPARK-43805
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> ref: 
> [https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]
> https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_replace
> [~fanjia] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43805) Support SELECT * EXCEPT AND SELECT * REPLACE

2023-05-25 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43805:
--
Description: 
ref: 
[https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]

[~fanjia] 

  was:
ref: 
[https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]

 


> Support SELECT * EXCEPT AND  SELECT * REPLACE
> -
>
> Key: SPARK-43805
> URL: https://issues.apache.org/jira/browse/SPARK-43805
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> ref: 
> [https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]
> [~fanjia] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43805) Support SELECT * EXCEPT AND SELECT * REPLACE

2023-05-25 Thread melin (Jira)
melin created SPARK-43805:
-

 Summary: Support SELECT * EXCEPT AND  SELECT * REPLACE
 Key: SPARK-43805
 URL: https://issues.apache.org/jira/browse/SPARK-43805
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.5.0
Reporter: melin


ref: 
[https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43776) [BUG] MySQL jdbc cursor has not taken effect

2023-05-24 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43776:
--
Description: 
JDBCRDD.scala:  stmt.setFetchSize(options.fetchSize)

 
JdbcRDD.scala:
{code:java}
if (url.startsWith("jdbc:mysql:")){ 
// setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force
// streaming results, rather than pulling entire resultset into memory.
// See the below URL
// 
dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
  stmt.setFetchSize(Integer.MIN_VALUE) }
else
{ stmt.setFetchSize(100) }
{code}
 

 

  was:
JDBCRDD.scala:  stmt.setFetchSize(options.fetchSize)

 
JdbcRDD.scala:
{code:java}
if (url.startsWith("jdbc:mysql:"))
{ // setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force // 
streaming results, rather than pulling entire resultset into memory. // See the 
below URL // 
dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
 stmt.setFetchSize(Integer.MIN_VALUE) }
else
{ stmt.setFetchSize(100) }
{code}
 

 


> [BUG] MySQL jdbc cursor has not taken effect
> 
>
> Key: SPARK-43776
> URL: https://issues.apache.org/jira/browse/SPARK-43776
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: melin
>Priority: Major
>
> JDBCRDD.scala:  stmt.setFetchSize(options.fetchSize)
>  
> JdbcRDD.scala:
> {code:java}
> if (url.startsWith("jdbc:mysql:")){ 
> // setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force
> // streaming results, rather than pulling entire resultset into memory.
> // See the below URL
> // 
> dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
>   stmt.setFetchSize(Integer.MIN_VALUE) }
> else
> { stmt.setFetchSize(100) }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43776) [BUG] MySQL jdbc cursor has not taken effect

2023-05-24 Thread melin (Jira)
melin created SPARK-43776:
-

 Summary: [BUG] MySQL jdbc cursor has not taken effect
 Key: SPARK-43776
 URL: https://issues.apache.org/jira/browse/SPARK-43776
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.4.1, 3.5.0
Reporter: melin


JDBCRDD.scala:  stmt.setFetchSize(options.fetchSize)

 
JdbcRDD.scala:
```
if (url.startsWith("jdbc:mysql:")) {
// setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force
// streaming results, rather than pulling entire resultset into memory.
// See the below URL
// 
dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html

stmt.setFetchSize(Integer.MIN_VALUE)
} else {
stmt.setFetchSize(100)
}
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43776) [BUG] MySQL jdbc cursor has not taken effect

2023-05-24 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43776:
--
Description: 
JDBCRDD.scala:  stmt.setFetchSize(options.fetchSize)

 
JdbcRDD.scala:
{code:java}
if (url.startsWith("jdbc:mysql:"))
{ // setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force // 
streaming results, rather than pulling entire resultset into memory. // See the 
below URL // 
dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
 stmt.setFetchSize(Integer.MIN_VALUE) }
else
{ stmt.setFetchSize(100) }
{code}
 

 

  was:
JDBCRDD.scala:  stmt.setFetchSize(options.fetchSize)

 
JdbcRDD.scala:
```
if (url.startsWith("jdbc:mysql:")) {
// setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force
// streaming results, rather than pulling entire resultset into memory.
// See the below URL
// 
dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html

stmt.setFetchSize(Integer.MIN_VALUE)
} else {
stmt.setFetchSize(100)
}
```


> [BUG] MySQL jdbc cursor has not taken effect
> 
>
> Key: SPARK-43776
> URL: https://issues.apache.org/jira/browse/SPARK-43776
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: melin
>Priority: Major
>
> JDBCRDD.scala:  stmt.setFetchSize(options.fetchSize)
>  
> JdbcRDD.scala:
> {code:java}
> if (url.startsWith("jdbc:mysql:"))
> { // setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force 
> // streaming results, rather than pulling entire resultset into memory. // 
> See the below URL // 
> dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
>  stmt.setFetchSize(Integer.MIN_VALUE) }
> else
> { stmt.setFetchSize(100) }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43748) Support DISTINCT ON

2023-05-23 Thread melin (Jira)
melin created SPARK-43748:
-

 Summary: Support DISTINCT ON
 Key: SPARK-43748
 URL: https://issues.apache.org/jira/browse/SPARK-43748
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.5.0
Reporter: melin


ref: 
https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-select-distinct/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-05-22 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724942#comment-17724942
 ] 

melin commented on SPARK-43338:
---

kyuubi verified it: 
[https://kyuubi.readthedocs.io/en/v1.7.1-rc0/connector/spark/hive.html]
 
kyuubi is implemented based on HiveSessionCatalog. If there are huid tables in 
the hive database, another Hudi catalog needs to be registered. The same hms 
has two catalognames, which does not meet my requirements.
 

> Support  modify the SESSION_CATALOG_NAME value
> --
>
> Key: SPARK-43338
> URL: https://issues.apache.org/jira/browse/SPARK-43338
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> {code:java}
> private[sql] object CatalogManager {
> val SESSION_CATALOG_NAME: String = "spark_catalog"
> }{code}
>  
> The SESSION_CATALOG_NAME value cannot be modified。
> If multiple Hive Metastores exist, the platform manages multiple hms metadata 
> and classifies them by catalogName. A different catalog name is required
> [~fanjia] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-05-22 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724933#comment-17724933
 ] 

melin commented on SPARK-43338:
---

I don't need to access multiple hms in the same sparksession, I only need to 
access one of them. Assign each hms a unique catalogname only so that the meta 
tableId is unique: catalog.database.table.

> Support  modify the SESSION_CATALOG_NAME value
> --
>
> Key: SPARK-43338
> URL: https://issues.apache.org/jira/browse/SPARK-43338
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> {code:java}
> private[sql] object CatalogManager {
> val SESSION_CATALOG_NAME: String = "spark_catalog"
> }{code}
>  
> The SESSION_CATALOG_NAME value cannot be modified。
> If multiple Hive Metastores exist, the platform manages multiple hms metadata 
> and classifies them by catalogName. A different catalog name is required
> [~fanjia] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37351) Supports write data flow control

2023-05-22 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-37351:
--
Description: 
The hive table data is written to a relational database, generally an online 
production database. If the writing speed has no traffic control, it can easily 
affect the stability of the online system. It is recommended to add traffic 
control parameters

[~fanjia] 

  was:The hive table data is written to a relational database, generally an 
online production database. If the writing speed has no traffic control, it can 
easily affect the stability of the online system. It is recommended to add 
traffic control parameters


> Supports write data flow control
> 
>
> Key: SPARK-37351
> URL: https://issues.apache.org/jira/browse/SPARK-37351
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: melin
>Priority: Major
>
> The hive table data is written to a relational database, generally an online 
> production database. If the writing speed has no traffic control, it can 
> easily affect the stability of the online system. It is recommended to add 
> traffic control parameters
> [~fanjia] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-05-22 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43338:
--
Description: 
{code:java}
private[sql] object CatalogManager {
val SESSION_CATALOG_NAME: String = "spark_catalog"
}{code}
 
The SESSION_CATALOG_NAME value cannot be modified。

If multiple Hive Metastores exist, the platform manages multiple hms metadata 
and classifies them by catalogName. A different catalog name is required

[~fanjia] 

  was:
{code:java}
private[sql] object CatalogManager {
val SESSION_CATALOG_NAME: String = "spark_catalog"
}{code}
 
The SESSION_CATALOG_NAME value cannot be modified。

If multiple Hive Metastores exist, the platform manages multiple hms metadata 
and classifies them by catalogName. A different catalog name is required

[~yao] 


> Support  modify the SESSION_CATALOG_NAME value
> --
>
> Key: SPARK-43338
> URL: https://issues.apache.org/jira/browse/SPARK-43338
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> {code:java}
> private[sql] object CatalogManager {
> val SESSION_CATALOG_NAME: String = "spark_catalog"
> }{code}
>  
> The SESSION_CATALOG_NAME value cannot be modified。
> If multiple Hive Metastores exist, the platform manages multiple hms metadata 
> and classifies them by catalogName. A different catalog name is required
> [~fanjia] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43521) Support CREATE TABLE LIKE FILE for PARQUET

2023-05-16 Thread melin (Jira)
melin created SPARK-43521:
-

 Summary: Support CREATE TABLE LIKE FILE for PARQUET
 Key: SPARK-43521
 URL: https://issues.apache.org/jira/browse/SPARK-43521
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: melin


ref: https://issues.apache.org/jira/browse/HIVE-26395



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43400) Add Primary Key syntax support

2023-05-10 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43400:
--
Summary: Add Primary Key syntax support  (was: create table support the 
PRIMARY KEY keyword)

> Add Primary Key syntax support
> --
>
> Key: SPARK-43400
> URL: https://issues.apache.org/jira/browse/SPARK-43400
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> apache paimon and hudi support primary key definitions. It is necessary to 
> support the primary key definition syntax
> https://docs.snowflake.com/en/sql-reference/sql/create-table-constraint#constraint-properties
> [~gurwls223] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-05-09 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720831#comment-17720831
 ] 

melin edited comment on SPARK-43338 at 5/9/23 7:38 AM:
---

If the same hive database has parquet and hudi table, does HiveTableCatalog 
support access to hudi table?  not want to register two catalog


was (Author: melin):
If the same hive database has parquet and hudi table, does HiveTableCatalog 
support access to hudi table?

> Support  modify the SESSION_CATALOG_NAME value
> --
>
> Key: SPARK-43338
> URL: https://issues.apache.org/jira/browse/SPARK-43338
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> {code:java}
> private[sql] object CatalogManager {
> val SESSION_CATALOG_NAME: String = "spark_catalog"
> }{code}
>  
> The SESSION_CATALOG_NAME value cannot be modified。
> If multiple Hive Metastores exist, the platform manages multiple hms metadata 
> and classifies them by catalogName. A different catalog name is required
> [~yao] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-05-09 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720831#comment-17720831
 ] 

melin commented on SPARK-43338:
---

If the same hive database has parquet and hudi table, does HiveTableCatalog 
support access to hudi table?

> Support  modify the SESSION_CATALOG_NAME value
> --
>
> Key: SPARK-43338
> URL: https://issues.apache.org/jira/browse/SPARK-43338
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> {code:java}
> private[sql] object CatalogManager {
> val SESSION_CATALOG_NAME: String = "spark_catalog"
> }{code}
>  
> The SESSION_CATALOG_NAME value cannot be modified。
> If multiple Hive Metastores exist, the platform manages multiple hms metadata 
> and classifies them by catalogName. A different catalog name is required
> [~yao] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-05-09 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720794#comment-17720794
 ] 

melin commented on SPARK-43338:
---

You understand a big feature. Only one hms is accessed in a sparksession. I 
just want spark_catalog to be modified. For example, if you have two hadoop 
clusters, there should be two hms. Metadata management platform (similar to 
databricks unity catalog), the acquisition of the HMS metadata, in order to 
distinguish the uniqueness, need to add catalogName (tableid: catalogName. 
SchemaName. TableName). When spark accesses hive tables, it is consistent with 
the catalogname of tableid instead of spark_catalog。

 

> Support  modify the SESSION_CATALOG_NAME value
> --
>
> Key: SPARK-43338
> URL: https://issues.apache.org/jira/browse/SPARK-43338
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> {code:java}
> private[sql] object CatalogManager {
> val SESSION_CATALOG_NAME: String = "spark_catalog"
> }{code}
>  
> The SESSION_CATALOG_NAME value cannot be modified。
> If multiple Hive Metastores exist, the platform manages multiple hms metadata 
> and classifies them by catalogName. A different catalog name is required
> [~yao] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43400) create table support the PRIMARY KEY keyword

2023-05-08 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43400:
--
Description: 
apache paimon and hudi support primary key definitions. It is necessary to 
support the primary key definition syntax

https://docs.snowflake.com/en/sql-reference/sql/create-table-constraint#constraint-properties

[~gurwls223] 

  was:
apache paimon and hudi support primary key definitions. It is necessary to 
support the primary key definition syntax

[~gurwls223] 


> create table support the PRIMARY KEY keyword
> 
>
> Key: SPARK-43400
> URL: https://issues.apache.org/jira/browse/SPARK-43400
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> apache paimon and hudi support primary key definitions. It is necessary to 
> support the primary key definition syntax
> https://docs.snowflake.com/en/sql-reference/sql/create-table-constraint#constraint-properties
> [~gurwls223] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43400) create table support the PRIMARY KEY keyword

2023-05-08 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43400:
--
Description: 
apache paimon and hudi support primary key definitions. It is necessary to 
support the primary key definition syntax

[~gurwls223] 

  was:apache paimon and hudi support primary key definitions. It is necessary 
to support the primary key definition syntax


> create table support the PRIMARY KEY keyword
> 
>
> Key: SPARK-43400
> URL: https://issues.apache.org/jira/browse/SPARK-43400
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> apache paimon and hudi support primary key definitions. It is necessary to 
> support the primary key definition syntax
> [~gurwls223] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43400) create table support the PRIMARY KEY keyword

2023-05-07 Thread melin (Jira)
melin created SPARK-43400:
-

 Summary: create table support the PRIMARY KEY keyword
 Key: SPARK-43400
 URL: https://issues.apache.org/jira/browse/SPARK-43400
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: melin


apache paimon and hudi support primary key definitions. It is necessary to 
support the primary key definition syntax



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported

2023-05-07 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720110#comment-17720110
 ] 

melin edited comment on SPARK-43382 at 5/7/23 8:35 AM:
---

There is an idea to customize the hadoop filesystem based on common vfs. The 
common vfs supports reading different archive files.

simple demo:  
[https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/]
{code:java}
spark.read.option("header", "true")
      .csv("vfs://tgz:ftp://fcftp:fcftp@172.18.1.52/csv.tar.gz!/csv;).show()

spark.read.option("header", "true")
.csv("vfs://tgz:s3://BxiljVd5YZa3mRUn:3Mq9dsmdMbN1JipE1TlOF7OuDkuYBYpe@cdh1:9300/demo-bucket/csv.tar.gz!/csv").show()
    

spark.read.option("header", "true")
.csv("vfs://tgz:sftp:///test:test2023@172.18.5.46:22/ftpdata/csv.tar.gz!/csv;).show()
 {code}
 


was (Author: melin):
There is an idea to customize the hadoop filesystem based on common vfs. The 
common vfs supports reading different archive files.

simple demo: 
[https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/|https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/SparkMaskDemo.scala]

 
{code:java}
spark.read.option("header", "true")
      .csv("vfs://tgz:ftp://fcftp:fcftp@172.18.1.52/csv.tar.gz!/csv;).show()

spark.read.option("header", "true")
.csv("vfs://tgz:s3://BxiljVd5YZa3mRUn:3Mq9dsmdMbN1JipE1TlOF7OuDkuYBYpe@cdh1:9300/demo-bucket/csv.tar.gz!/csv").show()
    

spark.read.option("header", "true")
.csv("vfs://tgz:sftp:///test:test2023@172.18.5.46:22/ftpdata/csv.tar.gz!/csv;).show()
 {code}
 

> Read and write csv and json files. Archive files such as zip or gz are 
> supported
> 
>
> Key: SPARK-43382
> URL: https://issues.apache.org/jira/browse/SPARK-43382
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> snowflake data import and export, support fixed files. For example:
>  
> {code:java}
> COPY INTO @mystage/data.csv.gz 
>  
> COPY INTO mytable 
> FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; 
> FILE_FORMAT = (TYPE = 'JSON') 
> MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; 
>  
> {code}
> Can spark directly read archive files?
> {code:java}
> spark.read.csv("/tutorials/dataloading/sales.json.gz")
> {code}
> @[~kaifeiYi] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported

2023-05-07 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720110#comment-17720110
 ] 

melin edited comment on SPARK-43382 at 5/7/23 8:34 AM:
---

There is an idea to customize the hadoop filesystem based on common vfs. The 
common vfs supports reading different archive files.

simple demo: 
[https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/|https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/SparkMaskDemo.scala]

 
{code:java}
spark.read.option("header", "true")
      .csv("vfs://tgz:ftp://fcftp:fcftp@172.18.1.52/csv.tar.gz!/csv;).show()

spark.read.option("header", "true")
.csv("vfs://tgz:s3://BxiljVd5YZa3mRUn:3Mq9dsmdMbN1JipE1TlOF7OuDkuYBYpe@cdh1:9300/demo-bucket/csv.tar.gz!/csv").show()
    

spark.read.option("header", "true")
.csv("vfs://tgz:sftp:///test:test2023@172.18.5.46:22/ftpdata/csv.tar.gz!/csv;).show()
 {code}
 


was (Author: melin):
There is an idea to customize the hadoop filesystem based on common vfs. The 
common vfs supports reading different archive files.

simple demo: 
[https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/SparkMaskDemo.scala]

spark.read.option("header", "true")
      .csv("vfs://tgz:ftp://fcftp:fcftp@172.18.1.52/csv.tar.gz!/csv;).show()

> Read and write csv and json files. Archive files such as zip or gz are 
> supported
> 
>
> Key: SPARK-43382
> URL: https://issues.apache.org/jira/browse/SPARK-43382
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> snowflake data import and export, support fixed files. For example:
>  
> {code:java}
> COPY INTO @mystage/data.csv.gz 
>  
> COPY INTO mytable 
> FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; 
> FILE_FORMAT = (TYPE = 'JSON') 
> MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; 
>  
> {code}
> Can spark directly read archive files?
> {code:java}
> spark.read.csv("/tutorials/dataloading/sales.json.gz")
> {code}
> @[~kaifeiYi] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported

2023-05-07 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43382:
--
Description: 
snowflake data import and export, support fixed files. For example:

 
{code:java}
COPY INTO @mystage/data.csv.gz 
 
COPY INTO mytable 
FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; 
FILE_FORMAT = (TYPE = 'JSON') 
MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; 
 
{code}
Can spark directly read archive files?
{code:java}
spark.read.csv("/tutorials/dataloading/sales.json.gz")

{code}
@[~kaifeiYi] 

  was:
snowflake data import and export, support fixed files. For example:

 
{code:java}
COPY INTO @mystage/data.csv.gz 
 
COPY INTO mytable 
FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; 
FILE_FORMAT = (TYPE = 'JSON') 
MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; 
 
{code}
Can spark directly read archive files?
{code:java}
spark.read.csv("/tutorials/dataloading/sales.json.gz"){code}
@[~kaifeiYi] 


> Read and write csv and json files. Archive files such as zip or gz are 
> supported
> 
>
> Key: SPARK-43382
> URL: https://issues.apache.org/jira/browse/SPARK-43382
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> snowflake data import and export, support fixed files. For example:
>  
> {code:java}
> COPY INTO @mystage/data.csv.gz 
>  
> COPY INTO mytable 
> FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; 
> FILE_FORMAT = (TYPE = 'JSON') 
> MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; 
>  
> {code}
> Can spark directly read archive files?
> {code:java}
> spark.read.csv("/tutorials/dataloading/sales.json.gz")
> {code}
> @[~kaifeiYi] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported

2023-05-06 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720110#comment-17720110
 ] 

melin edited comment on SPARK-43382 at 5/6/23 3:57 PM:
---

There is an idea to customize the hadoop filesystem based on common vfs. The 
common vfs supports reading different archive files.

simple demo: 
[https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/SparkMaskDemo.scala]

spark.read.option("header", "true")
      .csv("vfs://tgz:ftp://fcftp:fcftp@172.18.1.52/csv.tar.gz!/csv;).show()


was (Author: melin):
There is an idea to customize the hadoop filesystem based on common vfs. The 
common vfs supports reading different archive files.

 

simple demo: 
[https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/SparkMaskDemo.scala]

> Read and write csv and json files. Archive files such as zip or gz are 
> supported
> 
>
> Key: SPARK-43382
> URL: https://issues.apache.org/jira/browse/SPARK-43382
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> snowflake data import and export, support fixed files. For example:
>  
> {code:java}
> COPY INTO @mystage/data.csv.gz 
>  
> COPY INTO mytable 
> FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; 
> FILE_FORMAT = (TYPE = 'JSON') 
> MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; 
>  
> {code}
> Can spark directly read archive files?
> {code:java}
> spark.read.csv("/tutorials/dataloading/sales.json.gz"){code}
> @[~kaifeiYi] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported

2023-05-06 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720110#comment-17720110
 ] 

melin edited comment on SPARK-43382 at 5/6/23 3:57 PM:
---

There is an idea to customize the hadoop filesystem based on common vfs. The 
common vfs supports reading different archive files.

 

simple demo: 
[https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/SparkMaskDemo.scala]


was (Author: melin):
There is an idea to customize the hadoop filesystem based on common vfs. The 
common vfs supports reading different archive files.

> Read and write csv and json files. Archive files such as zip or gz are 
> supported
> 
>
> Key: SPARK-43382
> URL: https://issues.apache.org/jira/browse/SPARK-43382
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> snowflake data import and export, support fixed files. For example:
>  
> {code:java}
> COPY INTO @mystage/data.csv.gz 
>  
> COPY INTO mytable 
> FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; 
> FILE_FORMAT = (TYPE = 'JSON') 
> MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; 
>  
> {code}
> Can spark directly read archive files?
> {code:java}
> spark.read.csv("/tutorials/dataloading/sales.json.gz"){code}
> @[~kaifeiYi] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported

2023-05-06 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720110#comment-17720110
 ] 

melin commented on SPARK-43382:
---

There is an idea to customize the hadoop filesystem based on common vfs. The 
common vfs supports reading different archive files.

> Read and write csv and json files. Archive files such as zip or gz are 
> supported
> 
>
> Key: SPARK-43382
> URL: https://issues.apache.org/jira/browse/SPARK-43382
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> snowflake data import and export, support fixed files. For example:
>  
> {code:java}
> COPY INTO @mystage/data.csv.gz 
>  
> COPY INTO mytable 
> FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; 
> FILE_FORMAT = (TYPE = 'JSON') 
> MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; 
>  
> {code}
> Can spark directly read archive files?
> {code:java}
> spark.read.csv("/tutorials/dataloading/sales.json.gz"){code}
> @[~kaifeiYi] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported

2023-05-05 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43382:
--
Description: 
snowflake data import and export, support fixed files. For example:

 
{code:java}
COPY INTO @mystage/data.csv.gz 
 
COPY INTO mytable 
FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; 
FILE_FORMAT = (TYPE = 'JSON') 
MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; 
 
{code}
Can spark directly read archive files?
{code:java}
spark.read.csv("/tutorials/dataloading/sales.json.gz"){code}
@[~kaifeiYi] 

  was:
snowflake data import and export, support fixed files. For example:

 
{code:java}
COPY INTO @mystage/data.csv.gz 
 
COPY INTO mytable 
FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; 
FILE_FORMAT = (TYPE = 'JSON') 
MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; 
 
{code}
Can spark directly read archive files?
{code:java}
spark.read.csv("/tutorials/dataloading/sales.json.gz"){code}
 


> Read and write csv and json files. Archive files such as zip or gz are 
> supported
> 
>
> Key: SPARK-43382
> URL: https://issues.apache.org/jira/browse/SPARK-43382
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> snowflake data import and export, support fixed files. For example:
>  
> {code:java}
> COPY INTO @mystage/data.csv.gz 
>  
> COPY INTO mytable 
> FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; 
> FILE_FORMAT = (TYPE = 'JSON') 
> MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; 
>  
> {code}
> Can spark directly read archive files?
> {code:java}
> spark.read.csv("/tutorials/dataloading/sales.json.gz"){code}
> @[~kaifeiYi] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported

2023-05-05 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43382:
--
Description: 
snowflake data import and export, support fixed files. For example:

 
{code:java}
COPY INTO @mystage/data.csv.gz 
 
COPY INTO mytable 
FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; 
FILE_FORMAT = (TYPE = 'JSON') 
MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; 
 
{code}
Can spark directly read archive files?
{code:java}
spark.read.csv("/tutorials/dataloading/sales.json.gz"){code}
 

> Read and write csv and json files. Archive files such as zip or gz are 
> supported
> 
>
> Key: SPARK-43382
> URL: https://issues.apache.org/jira/browse/SPARK-43382
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> snowflake data import and export, support fixed files. For example:
>  
> {code:java}
> COPY INTO @mystage/data.csv.gz 
>  
> COPY INTO mytable 
> FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; 
> FILE_FORMAT = (TYPE = 'JSON') 
> MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; 
>  
> {code}
> Can spark directly read archive files?
> {code:java}
> spark.read.csv("/tutorials/dataloading/sales.json.gz"){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported

2023-05-05 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43382:
--
Environment: (was: snowflake data import and export, support fixed 
files. For example: 

 
{code:java}
COPY INTO @mystage/data.csv.gz 
 
COPY INTO mytable 
FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; 
FILE_FORMAT = (TYPE = 'JSON') 
MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; 
 
{code}
Can spark directly read archive files? 
{code:java}
spark.read.csv("/tutorials/dataloading/sales.json.gz"){code}
 )

> Read and write csv and json files. Archive files such as zip or gz are 
> supported
> 
>
> Key: SPARK-43382
> URL: https://issues.apache.org/jira/browse/SPARK-43382
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported

2023-05-05 Thread melin (Jira)
melin created SPARK-43382:
-

 Summary: Read and write csv and json files. Archive files such as 
zip or gz are supported
 Key: SPARK-43382
 URL: https://issues.apache.org/jira/browse/SPARK-43382
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
 Environment: snowflake data import and export, support fixed files. 
For example: 

 
{code:java}
COPY INTO @mystage/data.csv.gz 
 
COPY INTO mytable 
FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; 
FILE_FORMAT = (TYPE = 'JSON') 
MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; 
 
{code}
Can spark directly read archive files? 
{code:java}
spark.read.csv("/tutorials/dataloading/sales.json.gz"){code}
 
Reporter: melin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-05-04 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43338:
--
Description: 
{code:java}
private[sql] object CatalogManager {
val SESSION_CATALOG_NAME: String = "spark_catalog"
}{code}
 
The SESSION_CATALOG_NAME value cannot be modified。

If multiple Hive Metastores exist, the platform manages multiple hms metadata 
and classifies them by catalogName. A different catalog name is required

[~yao] 

  was:
{code:java}
private[sql] object CatalogManager {
val SESSION_CATALOG_NAME: String = "spark_catalog"
}{code}
 
The SESSION_CATALOG_NAME value cannot be modified。

If multiple Hive Metastores exist, the platform manages multiple hms metadata 
and classifies them by catalogName. A different catalog name is required

 

[~gurwls223] 


> Support  modify the SESSION_CATALOG_NAME value
> --
>
> Key: SPARK-43338
> URL: https://issues.apache.org/jira/browse/SPARK-43338
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> {code:java}
> private[sql] object CatalogManager {
> val SESSION_CATALOG_NAME: String = "spark_catalog"
> }{code}
>  
> The SESSION_CATALOG_NAME value cannot be modified。
> If multiple Hive Metastores exist, the platform manages multiple hms metadata 
> and classifies them by catalogName. A different catalog name is required
> [~yao] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-05-02 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43338:
--
Description: 
{code:java}
private[sql] object CatalogManager {
val SESSION_CATALOG_NAME: String = "spark_catalog"
}{code}
 
The SESSION_CATALOG_NAME value cannot be modified。

If multiple Hive Metastores exist, the platform manages multiple hms metadata 
and classifies them by catalogName. A different catalog name is required

 

[~gurwls223] 

  was:
{code:java}
private[sql] object CatalogManager {
val SESSION_CATALOG_NAME: String = "spark_catalog"
}{code}
 
The SESSION_CATALOG_NAME value cannot be modified。

If multiple Hive Metastores exist, the platform manages multiple hms metadata 
and classifies them by catalogName. A different catalog name is required


> Support  modify the SESSION_CATALOG_NAME value
> --
>
> Key: SPARK-43338
> URL: https://issues.apache.org/jira/browse/SPARK-43338
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> {code:java}
> private[sql] object CatalogManager {
> val SESSION_CATALOG_NAME: String = "spark_catalog"
> }{code}
>  
> The SESSION_CATALOG_NAME value cannot be modified。
> If multiple Hive Metastores exist, the platform manages multiple hms metadata 
> and classifies them by catalogName. A different catalog name is required
>  
> [~gurwls223] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value

2023-05-02 Thread melin (Jira)
melin created SPARK-43338:
-

 Summary: Support  modify the SESSION_CATALOG_NAME value
 Key: SPARK-43338
 URL: https://issues.apache.org/jira/browse/SPARK-43338
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: melin


{code:java}
private[sql] object CatalogManager {
val SESSION_CATALOG_NAME: String = "spark_catalog"
}{code}
 
The SESSION_CATALOG_NAME value cannot be modified。

If multiple Hive Metastores exist, the platform manages multiple hms metadata 
and classifies them by catalogName. A different catalog name is required



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43318) spark reader csv and json support wholetext parameters

2023-04-28 Thread melin (Jira)
melin created SPARK-43318:
-

 Summary: spark reader csv and json support wholetext parameters
 Key: SPARK-43318
 URL: https://issues.apache.org/jira/browse/SPARK-43318
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: melin
 Fix For: 3.5.0


FTPInputStream used by Hadoop FTPFileSystem does not support seek, and spark 
HadoopFileLinesReader fails to be read. 

Support to read the entire file, and then split lines, avoid reading failure

 

[https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPInputStream.java]

 

[~cloud_fan] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43101) Add CREATE/DROP catalog

2023-04-12 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43101:
--
Description: 
Convenient registration of the catalog, in sts

ref: [https://github.com/trinodb/trino/issues/12709]

 

  was:
Convenient registration of the catalog, in sts

ref: https://github.com/trinodb/trino/pull/13931


> Add CREATE/DROP catalog 
> 
>
> Key: SPARK-43101
> URL: https://issues.apache.org/jira/browse/SPARK-43101
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> Convenient registration of the catalog, in sts
> ref: [https://github.com/trinodb/trino/issues/12709]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43101) Dynamic Catalogs

2023-04-12 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43101:
--
Summary: Dynamic Catalogs  (was: Add CREATE/DROP catalog )

> Dynamic Catalogs
> 
>
> Key: SPARK-43101
> URL: https://issues.apache.org/jira/browse/SPARK-43101
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> Convenient registration of the catalog, in sts
> ref: [https://github.com/trinodb/trino/issues/12709]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43101) Add CREATE/DROP catalog

2023-04-12 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-43101:
--
Description: 
Convenient registration of the catalog, in sts

ref: [https://github.com/trinodb/trino/issues/12709]

  was:
Convenient registration of the catalog, in sts

ref: [https://github.com/trinodb/trino/issues/12709]

 


> Add CREATE/DROP catalog 
> 
>
> Key: SPARK-43101
> URL: https://issues.apache.org/jira/browse/SPARK-43101
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: melin
>Priority: Major
>
> Convenient registration of the catalog, in sts
> ref: [https://github.com/trinodb/trino/issues/12709]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43101) Add CREATE/DROP catalog

2023-04-11 Thread melin (Jira)
melin created SPARK-43101:
-

 Summary: Add CREATE/DROP catalog 
 Key: SPARK-43101
 URL: https://issues.apache.org/jira/browse/SPARK-43101
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: melin


Convenient registration of the catalog, in sts

ref: https://github.com/trinodb/trino/pull/13931



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert

2023-04-06 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709588#comment-17709588
 ] 

melin commented on SPARK-38200:
---

[~beliefer] 

MERGE INTO is a standard sql: https://en.wikipedia.org/wiki/Merge_%28SQL%29, 

mysql doesn't implement it, most databases do

> [SQL] Spark JDBC Savemode Supports Upsert
> -
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> upsert sql for different databases, Most databases support merge sql:
> sqlserver merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]
> mysql: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]
> oracle merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]
> postgres: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]
> postgres merg into sql : 
> [https://www.postgresql.org/docs/current/sql-merge.html]
> db2 merge into sql : 
> [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]
> derby merge into sql: 
> [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]
> he merg into sql : 
> [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]
>  
> [~yao] 
>  
> https://github.com/melin/datatunnel/tree/master/plugins/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/dialect
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43060) [SQL] Spark JDBC rate limitation

2023-04-06 Thread melin (Jira)
melin created SPARK-43060:
-

 Summary: [SQL] Spark JDBC rate limitation
 Key: SPARK-43060
 URL: https://issues.apache.org/jira/browse/SPARK-43060
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: melin


spark jdbc directly reads and writes data into the database, which may affect 
database stability. Therefore, a speed limit parameter is required.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert

2023-04-06 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-38200:
--
Description: 
upsert sql for different databases, Most databases support merge sql:

sqlserver merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]

mysql: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]

oracle merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]

postgres: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]

postgres merg into sql : 
[https://www.postgresql.org/docs/current/sql-merge.html]

db2 merge into sql : 
[https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]

derby merge into sql: 
[https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]

he merg into sql : 
[https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]

 

[~yao] 

 

https://github.com/melin/datatunnel/tree/master/plugins/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/dialect

 

  was:
upsert sql for different databases, Most databases support merge sql:

sqlserver merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]

mysql: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]

oracle merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]

postgres: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]

postgres merg into sql : 
[https://www.postgresql.org/docs/current/sql-merge.html]

db2 merge into sql : 
[https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]

derby merge into sql: 
[https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]

he merg into sql : 
[https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]

 

[~yao] 

 


> [SQL] Spark JDBC Savemode Supports Upsert
> -
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> upsert sql for different databases, Most databases support merge sql:
> sqlserver merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]
> mysql: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]
> oracle merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]
> postgres: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]
> postgres merg into sql : 
> [https://www.postgresql.org/docs/current/sql-merge.html]
> db2 merge into sql : 
> [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]
> derby merge into sql: 
> [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]
> he merg into sql : 
> [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]
>  
> [~yao] 
>  
> https://github.com/melin/datatunnel/tree/master/plugins/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/dialect
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: 

[jira] [Updated] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert

2023-04-06 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-38200:
--
Description: 
upsert sql for different databases, Most databases support merge sql:

sqlserver merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]

mysql: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]

oracle merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]

postgres: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]

postgres merg into sql : 
[https://www.postgresql.org/docs/current/sql-merge.html]

db2 merge into sql : 
[https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]

derby merge into sql: 
[https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]

he merg into sql : 
[https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]

 

[~yao] 

 

  was:
upsert sql for different databases, Most databases support merge sql:

sqlserver merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]

mysql: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]

oracle merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]

postgres: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]

postgres merg into sql : 
[https://www.postgresql.org/docs/current/sql-merge.html]

db2 merge into sql : 
[https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]

derby merge into sql: 
[https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]

he merg into sql : 
[https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]

 

[~maxgekk] 

 


> [SQL] Spark JDBC Savemode Supports Upsert
> -
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> upsert sql for different databases, Most databases support merge sql:
> sqlserver merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]
> mysql: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]
> oracle merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]
> postgres: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]
> postgres merg into sql : 
> [https://www.postgresql.org/docs/current/sql-merge.html]
> db2 merge into sql : 
> [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]
> derby merge into sql: 
> [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]
> he merg into sql : 
> [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]
>  
> [~yao] 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42627) Spark: Getting SQLException: Unsupported type -102 reading from Oracle

2023-04-06 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-42627:
--
Description: 
 
{code:java}
Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL 
type -102
    at 
org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308)
    at scala.Option.getOrElse(Option.scala:189)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
    at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
    at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
    at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
 
{code}
oracle driver
{code:java}

    com.oracle.database.jdbc
    ojdbc8
    21.9.0.0
{code}
 

oracle sql:

 
{code:java}
CREATE TABLE "ORDERS" 
   (    "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, 
    "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, 
    "PRICE" NUMBER(10,5) NOT NULL ENABLE, 
    "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, 
     PRIMARY KEY ("ORDER_ID")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS 
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "LOGMINER_TBS"  ENABLE, 
     SUPPLEMENTAL LOG DATA (ALL) COLUMNS
   ) SEGMENT CREATION IMMEDIATE 
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "LOGMINER_TBS"
 
{code}
[~yao] 

  was:
 
{code:java}
Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL 
type -102
    at 
org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308)
    at scala.Option.getOrElse(Option.scala:189)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
    at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
    at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
    at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
 
{code}
oracle driver
{code:java}

    com.oracle.database.jdbc
    ojdbc8
    21.9.0.0
{code}
 

oracle sql:

 
{code:java}
CREATE TABLE "ORDERS" 
   (    "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, 
    "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, 
    "PRICE" NUMBER(10,5) NOT NULL ENABLE, 
    "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, 
     PRIMARY KEY ("ORDER_ID")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS 
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 

[jira] [Updated] (SPARK-42627) Spark: Getting SQLException: Unsupported type -102 reading from Oracle

2023-04-02 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-42627:
--
Description: 
 
{code:java}
Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL 
type -102
    at 
org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308)
    at scala.Option.getOrElse(Option.scala:189)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
    at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
    at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
    at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
 
{code}
oracle driver
{code:java}

    com.oracle.database.jdbc
    ojdbc8
    21.9.0.0
{code}
 

oracle sql:

 
{code:java}
CREATE TABLE "ORDERS" 
   (    "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, 
    "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, 
    "PRICE" NUMBER(10,5) NOT NULL ENABLE, 
    "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, 
     PRIMARY KEY ("ORDER_ID")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS 
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "LOGMINER_TBS"  ENABLE, 
     SUPPLEMENTAL LOG DATA (ALL) COLUMNS
   ) SEGMENT CREATION IMMEDIATE 
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "LOGMINER_TBS"
 
{code}
[~maxgekk] 

  was:
 
{code:java}
Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL 
type -102
    at 
org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308)
    at scala.Option.getOrElse(Option.scala:189)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
    at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
    at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
    at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
 
{code}
oracle driver
{code:java}

    com.oracle.database.jdbc
    ojdbc8
    21.9.0.0
{code}
 

oracle sql:

 
{code:java}
CREATE TABLE "ORDERS" 
   (    "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, 
    "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, 
    "PRICE" NUMBER(10,5) NOT NULL ENABLE, 
    "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, 
     PRIMARY KEY ("ORDER_ID")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS 
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 

[jira] [Comment Edited] (SPARK-42627) Spark: Getting SQLException: Unsupported type -102 reading from Oracle

2023-04-02 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17707619#comment-17707619
 ] 

melin edited comment on SPARK-42627 at 4/2/23 6:50 AM:
---

not support type: TIMESTAMP (3) WITH LOCAL TIME ZONE

 

[~srowen] 


was (Author: melin):
not support type: TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE

 

[~srowen] 

> Spark: Getting SQLException: Unsupported type -102 reading from Oracle
> --
>
> Key: SPARK-42627
> URL: https://issues.apache.org/jira/browse/SPARK-42627
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: melin
>Priority: Major
>
>  
> {code:java}
> Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized 
> SQL type -102
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832)
>     at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225)
>     at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308)
>     at scala.Option.getOrElse(Option.scala:189)
>     at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308)
>     at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70)
>     at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
>     at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
>     at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
>     at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
>     at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
>     at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
>     at scala.Option.getOrElse(Option.scala:189)
>     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
>     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
>  
> {code}
> oracle driver
> {code:java}
> 
>     com.oracle.database.jdbc
>     ojdbc8
>     21.9.0.0
> {code}
>  
> oracle sql:
>  
> {code:java}
> CREATE TABLE "ORDERS" 
>    (    "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, 
>     "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, 
>     "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, 
>     "PRICE" NUMBER(10,5) NOT NULL ENABLE, 
>     "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, 
>     "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, 
>      PRIMARY KEY ("ORDER_ID")
>   USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS 
>   STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
>   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
> DEFAULT CELL_FLASH_CACHE DEFAULT)
>   TABLESPACE "LOGMINER_TBS"  ENABLE, 
>      SUPPLEMENTAL LOG DATA (ALL) COLUMNS
>    ) SEGMENT CREATION IMMEDIATE 
>   PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
>   STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
>   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
> DEFAULT CELL_FLASH_CACHE DEFAULT)
>   TABLESPACE "LOGMINER_TBS"
>  
> {code}
> [~beliefer] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42627) Spark: Getting SQLException: Unsupported type -102 reading from Oracle

2023-04-02 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17707619#comment-17707619
 ] 

melin commented on SPARK-42627:
---

not support type: TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE

 

[~srowen] 

> Spark: Getting SQLException: Unsupported type -102 reading from Oracle
> --
>
> Key: SPARK-42627
> URL: https://issues.apache.org/jira/browse/SPARK-42627
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: melin
>Priority: Major
>
>  
> {code:java}
> Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized 
> SQL type -102
>     at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832)
>     at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225)
>     at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308)
>     at scala.Option.getOrElse(Option.scala:189)
>     at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308)
>     at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70)
>     at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
>     at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
>     at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
>     at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
>     at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
>     at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
>     at scala.Option.getOrElse(Option.scala:189)
>     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
>     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
>  
> {code}
> oracle driver
> {code:java}
> 
>     com.oracle.database.jdbc
>     ojdbc8
>     21.9.0.0
> {code}
>  
> oracle sql:
>  
> {code:java}
> CREATE TABLE "ORDERS" 
>    (    "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, 
>     "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, 
>     "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, 
>     "PRICE" NUMBER(10,5) NOT NULL ENABLE, 
>     "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, 
>     "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, 
>      PRIMARY KEY ("ORDER_ID")
>   USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS 
>   STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
>   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
> DEFAULT CELL_FLASH_CACHE DEFAULT)
>   TABLESPACE "LOGMINER_TBS"  ENABLE, 
>      SUPPLEMENTAL LOG DATA (ALL) COLUMNS
>    ) SEGMENT CREATION IMMEDIATE 
>   PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
>   STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
>   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
> DEFAULT CELL_FLASH_CACHE DEFAULT)
>   TABLESPACE "LOGMINER_TBS"
>  
> {code}
> [~beliefer] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert

2023-04-02 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-38200:
--
Description: 
upsert sql for different databases, Most databases support merge sql:

sqlserver merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]

mysql: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]

oracle merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]

postgres: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]

postgres merg into sql : 
[https://www.postgresql.org/docs/current/sql-merge.html]

db2 merge into sql : 
[https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]

derby merge into sql: 
[https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]

he merg into sql : 
[https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]

 

[~maxgekk] 

 

  was:
upsert sql for different databases, Most databases support merge sql:

sqlserver merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]

mysql: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]

oracle merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]

postgres: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]

postgres merg into sql : 
[https://www.postgresql.org/docs/current/sql-merge.html]

db2 merge into sql : 
[https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]

derby merge into sql: 
[https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]

he merg into sql : 
[https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]

[~beliefer] [~cloud_fan] 


> [SQL] Spark JDBC Savemode Supports Upsert
> -
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> upsert sql for different databases, Most databases support merge sql:
> sqlserver merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]
> mysql: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]
> oracle merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]
> postgres: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]
> postgres merg into sql : 
> [https://www.postgresql.org/docs/current/sql-merge.html]
> db2 merge into sql : 
> [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]
> derby merge into sql: 
> [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]
> he merg into sql : 
> [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]
>  
> [~maxgekk] 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert

2023-03-27 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-38200:
--
Description: 
upsert sql for different databases, Most databases support merge sql:

sqlserver merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]

mysql: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]

oracle merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]

postgres: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]

postgres merg into sql : 
[https://www.postgresql.org/docs/current/sql-merge.html]

db2 merge into sql : 
[https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]

derby merge into sql: 
[https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]

he merg into sql : 
[https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]

[~beliefer] [~cloud_fan] 

  was:
When writing data into a relational database, data duplication needs to be 
considered. Both mysql and postgres support upsert syntax.

mysql:
{code:java}
replace into t(id, update_time) values(1, now()); {code}
pg:
{code:java}
INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT 
(id,name) DO UPDATE SET 
id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark
   {code}


> [SQL] Spark JDBC Savemode Supports Upsert
> -
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> upsert sql for different databases, Most databases support merge sql:
> sqlserver merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]
> mysql: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]
> oracle merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]
> postgres: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]
> postgres merg into sql : 
> [https://www.postgresql.org/docs/current/sql-merge.html]
> db2 merge into sql : 
> [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]
> derby merge into sql: 
> [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]
> he merg into sql : 
> [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]
> [~beliefer] [~cloud_fan] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert

2023-03-27 Thread melin (Jira)


[ https://issues.apache.org/jira/browse/SPARK-38200 ]


melin deleted comment on SPARK-38200:
---

was (Author: melin):
upsert sql for different databases, Most databases support merge sql:

sqlserver merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]

mysql: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]

oracle merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]

postgres: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]

postgres merg into sql : 
[https://www.postgresql.org/docs/current/sql-merge.html]

db2 merge into sql : 
[https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]

derby merge into sql: 
[https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]

he merg into sql : 
[https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]

[~beliefer] [~cloud_fan] 

> [SQL] Spark JDBC Savemode Supports Upsert
> -
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> When writing data into a relational database, data duplication needs to be 
> considered. Both mysql and postgres support upsert syntax.
> mysql:
> {code:java}
> replace into t(id, update_time) values(1, now()); {code}
> pg:
> {code:java}
> INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT 
> (id,name) DO UPDATE SET 
> id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark
>    {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert

2023-03-27 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-38200:
--
Summary: [SQL] Spark JDBC Savemode Supports Upsert  (was: [SQL] Spark JDBC 
Savemode Supports replace)

> [SQL] Spark JDBC Savemode Supports Upsert
> -
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> When writing data into a relational database, data duplication needs to be 
> considered. Both mysql and postgres support upsert syntax.
> mysql:
> {code:java}
> replace into t(id, update_time) values(1, now()); {code}
> pg:
> {code:java}
> INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT 
> (id,name) DO UPDATE SET 
> id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark
>    {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-38200) [SQL] Spark JDBC Savemode Supports replace

2023-03-27 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705765#comment-17705765
 ] 

melin edited comment on SPARK-38200 at 3/28/23 3:00 AM:


upsert sql for different databases, Most databases support merge sql:

sqlserver merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]

mysql: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]

oracle merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]

postgres: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]

postgres merg into sql : 
[https://www.postgresql.org/docs/current/sql-merge.html]

db2 merge into sql : 
[https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]

derby merge into sql: 
[https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]

he merg into sql : 
[https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]

[~beliefer] [~cloud_fan] 


was (Author: melin):
upsert sql for different databases, Most databases support merge sql:

sqlserver merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]

mysql: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]

oracle merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]

postgres: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]

db2 merge into sql : 
[https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]

derby merge into sql: 
[https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]

he merg into sql : 
[https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]

[~beliefer] [~cloud_fan] 

> [SQL] Spark JDBC Savemode Supports replace
> --
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> When writing data into a relational database, data duplication needs to be 
> considered. Both mysql and postgres support upsert syntax.
> mysql:
> {code:java}
> replace into t(id, update_time) values(1, now()); {code}
> pg:
> {code:java}
> INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT 
> (id,name) DO UPDATE SET 
> id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark
>    {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports replace

2023-03-27 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705765#comment-17705765
 ] 

melin commented on SPARK-38200:
---

upsert sql for different databases, Most databases support merge sql:

sqlserver merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]

mysql: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]

oracle merge into sql : 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]

postgres: 
[https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]

db2 merge into sql : 
[https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]

derby merge into sql: 
[https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]

he merg into sql : 
[https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]

[~beliefer] [~cloud_fan] 

> [SQL] Spark JDBC Savemode Supports replace
> --
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> When writing data into a relational database, data duplication needs to be 
> considered. Both mysql and postgres support upsert syntax.
> mysql:
> {code:java}
> replace into t(id, update_time) values(1, now()); {code}
> pg:
> {code:java}
> INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT 
> (id,name) DO UPDATE SET 
> id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark
>    {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42627) Spark: Getting SQLException: Unsupported type -102 reading from Oracle

2023-02-28 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-42627:
--
Description: 
 
{code:java}
Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL 
type -102
    at 
org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308)
    at scala.Option.getOrElse(Option.scala:189)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
    at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
    at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
    at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
 
{code}
oracle driver
{code:java}

    com.oracle.database.jdbc
    ojdbc8
    21.9.0.0
{code}
 

oracle sql:

 
{code:java}
CREATE TABLE "ORDERS" 
   (    "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, 
    "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, 
    "PRICE" NUMBER(10,5) NOT NULL ENABLE, 
    "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, 
     PRIMARY KEY ("ORDER_ID")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS 
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "LOGMINER_TBS"  ENABLE, 
     SUPPLEMENTAL LOG DATA (ALL) COLUMNS
   ) SEGMENT CREATION IMMEDIATE 
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "LOGMINER_TBS"
 
{code}
[~beliefer] 

  was:
 
{code:java}
Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL 
type -102
    at 
org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308)
    at scala.Option.getOrElse(Option.scala:189)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
    at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
    at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
    at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
 
{code}
oracle sql:

 
{code:java}
CREATE TABLE "ORDERS" 
   (    "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, 
    "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, 
    "PRICE" NUMBER(10,5) NOT NULL ENABLE, 
    "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, 
     PRIMARY KEY ("ORDER_ID")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS 
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE 

[jira] [Updated] (SPARK-42627) Spark: Getting SQLException: Unsupported type -102 reading from Oracle

2023-02-28 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-42627:
--
Description: 
 
{code:java}
Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL 
type -102
    at 
org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308)
    at scala.Option.getOrElse(Option.scala:189)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
    at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
    at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
    at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
 
{code}
oracle sql:

 
{code:java}
CREATE TABLE "ORDERS" 
   (    "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, 
    "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, 
    "PRICE" NUMBER(10,5) NOT NULL ENABLE, 
    "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, 
     PRIMARY KEY ("ORDER_ID")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS 
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "LOGMINER_TBS"  ENABLE, 
     SUPPLEMENTAL LOG DATA (ALL) COLUMNS
   ) SEGMENT CREATION IMMEDIATE 
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "LOGMINER_TBS"
 
{code}
[~beliefer] 

  was:
 
{code:java}
Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL 
type -102
    at 
org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308)
    at scala.Option.getOrElse(Option.scala:189)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
    at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
    at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
    at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
 
{code}
oracle sql:

 
{code:java}
CREATE TABLE "ORDERS" 
   (    "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, 
    "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, 
    "PRICE" NUMBER(10,5) NOT NULL ENABLE, 
    "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, 
     PRIMARY KEY ("ORDER_ID")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS 
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "LOGMINER_TBS"  ENABLE, 
     SUPPLEMENTAL LOG DATA (ALL) COLUMNS
   ) SEGMENT CREATION IMMEDIATE 

[jira] [Updated] (SPARK-42627) Spark: Getting SQLException: Unsupported type -102 reading from Oracle

2023-02-28 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-42627:
--
Description: 
 
{code:java}
Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL 
type -102
    at 
org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308)
    at scala.Option.getOrElse(Option.scala:189)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
    at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
    at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
    at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
 
{code}
oracle sql:

 
{code:java}
CREATE TABLE "ORDERS" 
   (    "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, 
    "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, 
    "PRICE" NUMBER(10,5) NOT NULL ENABLE, 
    "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, 
     PRIMARY KEY ("ORDER_ID")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS 
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "LOGMINER_TBS"  ENABLE, 
     SUPPLEMENTAL LOG DATA (ALL) COLUMNS
   ) SEGMENT CREATION IMMEDIATE 
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "LOGMINER_TBS"
 
{code}
 

  was:
```

Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL 
type -102
    at 
org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308)
    at scala.Option.getOrElse(Option.scala:189)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
    at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
    at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
    at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)

```

 

oracle sql:

```sql

CREATE TABLE "ORDERS" 
   (    "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, 
    "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, 
    "PRICE" NUMBER(10,5) NOT NULL ENABLE, 
    "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, 
     PRIMARY KEY ("ORDER_ID")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS 
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "LOGMINER_TBS"  ENABLE, 
     SUPPLEMENTAL LOG DATA (ALL) COLUMNS
   ) SEGMENT CREATION IMMEDIATE 
  PCTFREE 10 PCTUSED 40 

[jira] [Created] (SPARK-42627) Spark: Getting SQLException: Unsupported type -102 reading from Oracle

2023-02-28 Thread melin (Jira)
melin created SPARK-42627:
-

 Summary: Spark: Getting SQLException: Unsupported type -102 
reading from Oracle
 Key: SPARK-42627
 URL: https://issues.apache.org/jira/browse/SPARK-42627
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.2
Reporter: melin


```

Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL 
type -102
    at 
org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308)
    at scala.Option.getOrElse(Option.scala:189)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
    at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
    at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
    at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
    at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)

```

 

oracle sql:

```sql

CREATE TABLE "ORDERS" 
   (    "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, 
    "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, 
    "PRICE" NUMBER(10,5) NOT NULL ENABLE, 
    "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, 
    "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, 
     PRIMARY KEY ("ORDER_ID")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS 
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "LOGMINER_TBS"  ENABLE, 
     SUPPLEMENTAL LOG DATA (ALL) COLUMNS
   ) SEGMENT CREATION IMMEDIATE 
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE 
DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "LOGMINER_TBS"

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40568) Spark Streaming support Debezium

2022-09-26 Thread melin (Jira)
melin created SPARK-40568:
-

 Summary: Spark Streaming support Debezium
 Key: SPARK-40568
 URL: https://issues.apache.org/jira/browse/SPARK-40568
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.4.0
Reporter: melin


Debezuim is a very popular CDC technology. Spark Structured Streaming supports 
Debezuim, which facilitates data writing to data lakes。

The most commonly used scheme is FLink CDC,Hope Spark can support it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40189) Support json_array_get/json_array_length function

2022-08-23 Thread melin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583473#comment-17583473
 ] 

melin commented on SPARK-40189:
---

[~maxgekk] 

> Support json_array_get/json_array_length function
> -
>
> Key: SPARK-40189
> URL: https://issues.apache.org/jira/browse/SPARK-40189
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: melin
>Priority: Major
>
> presto provides these two functions,frequently used
> https://prestodb.io/docs/current/functions/json.html#json-functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40190) Support json_array_get and json_array_length function

2022-08-23 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin resolved SPARK-40190.
---
Resolution: Duplicate

> Support json_array_get and json_array_length function
> -
>
> Key: SPARK-40190
> URL: https://issues.apache.org/jira/browse/SPARK-40190
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: melin
>Priority: Major
>
> presto provides these two functions, which are often used:
> https://prestodb.io/docs/current/functions/json.html#json-functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40190) Support json_array_get and json_array_length function

2022-08-23 Thread melin (Jira)
melin created SPARK-40190:
-

 Summary: Support json_array_get and json_array_length function
 Key: SPARK-40190
 URL: https://issues.apache.org/jira/browse/SPARK-40190
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: melin


presto provides these two functions, which are often used:

https://prestodb.io/docs/current/functions/json.html#json-functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40189) Support json_array_get/json_array_length function

2022-08-23 Thread melin (Jira)
melin created SPARK-40189:
-

 Summary: Support json_array_get/json_array_length function
 Key: SPARK-40189
 URL: https://issues.apache.org/jira/browse/SPARK-40189
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: melin


presto provides these two functions,frequently used

https://prestodb.io/docs/current/functions/json.html#json-functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40184) Support modify the comment of a partitioned column

2022-08-22 Thread melin (Jira)
melin created SPARK-40184:
-

 Summary: Support modify the comment of a partitioned column
 Key: SPARK-40184
 URL: https://issues.apache.org/jira/browse/SPARK-40184
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: melin


Comment is not added to the partition field when the table is created. Can 
modify the partition field Comment



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40118) InMemoryFIleIndex caches filelists, how to solve the problem that multiple sparksessions run for a long time and filelists is out of sync

2022-08-17 Thread melin (Jira)
melin created SPARK-40118:
-

 Summary: InMemoryFIleIndex caches filelists, how to solve the 
problem that multiple sparksessions run for a long time and filelists is out of 
sync
 Key: SPARK-40118
 URL: https://issues.apache.org/jira/browse/SPARK-40118
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: melin


For example, two Sparksessions A and B, query table T1 in A, write data to 
table T1 in B, and fail to query the data written by B in A. There are 
currently two approaches: 

1. Close SparkSession A and restart it 

2. Invoke Refresh table Command. 

These two practices are not feasible for business users who do not know when to 
operate. Frequent refresh affects the interaction performance. 

 

Ideally, it would support a centralized caching scheme such as RedIS, providing 
an extended interface that allows you to customize the Cache



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39990) Restrict special characters in field name, which can be controlled by switches

2022-08-05 Thread melin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-39990:
--
Description: 
The hive metastore restricts field name to only contain alphanumerics and 
underscores. If the custom catalog does not use hms, Custom metadata system 
based on iceberg。these restrictions may not exist, such as reading excel data, 
writing iceberg table.

hack way forbidden:
{code:java}
@Around("execution(public * 
org.apache.spark.sql.execution.datasources.DataSourceUtils.checkFieldNames(..))")
public void checkFieldNames_1(ProceedingJoinPoint pjp) throws Throwable {
LOG.info("skip checkFieldNames 1");
}

@Around("execution(public * 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkFieldNames(..))")
public void checkFieldNames_2(ProceedingJoinPoint pjp) throws Throwable {
LOG.info("skip checkFieldNames 2");
}

@Around("execution(public * 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkFieldName(..))")
public void checkFieldNames_3(ProceedingJoinPoint pjp) throws Throwable
{ LOG.info("skip checkFieldNames 3"); }{code}
CREATE OR REPLACE TABLE huaixin_rp.bigdata.parquet_orders_rp5 USING ICEBERG  
select 12 as id, 'ceity' as `address(地  址)`

[~hyukjin.kwon] 

  was:
The hive metastore restricts field name to only contain alphanumerics and 
underscores. If the custom catalog does not use hms, Custom metadata system 
based on iceberg。these restrictions may not exist, such as reading excel data, 
writing iceberg table, and column names are prone to special characters such as 
spaces, parentheses, etc

hack way forbidden:
{code:java}
@Around("execution(public * 
org.apache.spark.sql.execution.datasources.DataSourceUtils.checkFieldNames(..))")
public void checkFieldNames_1(ProceedingJoinPoint pjp) throws Throwable {
LOG.info("skip checkFieldNames 1");
}

@Around("execution(public * 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkFieldNames(..))")
public void checkFieldNames_2(ProceedingJoinPoint pjp) throws Throwable {
LOG.info("skip checkFieldNames 2");
}

@Around("execution(public * 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkFieldName(..))")
public void checkFieldNames_3(ProceedingJoinPoint pjp) throws Throwable
{ LOG.info("skip checkFieldNames 3"); }{code}
CREATE OR REPLACE TABLE huaixin_rp.bigdata.parquet_orders_rp5 USING PARQUET  
select 12 as id, 'ceity' as `address(地  址)`

[~hyukjin.kwon] 


>  Restrict special characters in field name, which can be controlled by 
> switches
> ---
>
> Key: SPARK-39990
> URL: https://issues.apache.org/jira/browse/SPARK-39990
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: melin
>Priority: Major
>
> The hive metastore restricts field name to only contain alphanumerics and 
> underscores. If the custom catalog does not use hms, Custom metadata system 
> based on iceberg。these restrictions may not exist, such as reading excel 
> data, writing iceberg table.
> hack way forbidden:
> {code:java}
> @Around("execution(public * 
> org.apache.spark.sql.execution.datasources.DataSourceUtils.checkFieldNames(..))")
> public void checkFieldNames_1(ProceedingJoinPoint pjp) throws Throwable {
> LOG.info("skip checkFieldNames 1");
> }
> @Around("execution(public * 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkFieldNames(..))")
> public void checkFieldNames_2(ProceedingJoinPoint pjp) throws Throwable {
> LOG.info("skip checkFieldNames 2");
> }
> @Around("execution(public * 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkFieldName(..))")
> public void checkFieldNames_3(ProceedingJoinPoint pjp) throws Throwable
> { LOG.info("skip checkFieldNames 3"); }{code}
> CREATE OR REPLACE TABLE huaixin_rp.bigdata.parquet_orders_rp5 USING ICEBERG  
> select 12 as id, 'ceity' as `address(地  址)`
> [~hyukjin.kwon] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >