[jira] [Commented] (SPARK-48286) Analyze 'exists' default expression instead of 'current' default expression in structField to v2 column conversion
[ https://issues.apache.org/jira/browse/SPARK-48286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852984#comment-17852984 ] melin commented on SPARK-48286: --- defaultValueNotConstantError method not exists > Analyze 'exists' default expression instead of 'current' default expression > in structField to v2 column conversion > -- > > Key: SPARK-48286 > URL: https://issues.apache.org/jira/browse/SPARK-48286 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Uros Stankovic >Assignee: Uros Stankovic >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze method > accepts 3 parameter > 1) Field to analyze > 2) Statement type - String > 3) Metadata key - CURRENT_DEFAULT or EXISTS_DEFAULT > Method > org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column > pass fieldToAnalyze and EXISTS_DEFAULT as second parameter, so it is not > metadata key, instead of that, it is statement type, so bad expression is > analyzed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48450) Support Jdbc datasource custom data partitioning
[ https://issues.apache.org/jira/browse/SPARK-48450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-48450: -- Description: "partitionColumn, lowerBound, upperBound" is not an efficient table partitioning scheme for some databases, The amount of data in each partition is consistent. Such as: Oracle has more efficient data table partitioning, [https://github.com/apache/sqoop/blob/f8beae32a067d72bf9ed6e903b041ad347ca5491/src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java#L335C51-L335C76] was: "partitionColumn, lowerBound, upperBound" is not an efficient table partitioning scheme for some databases, such as: Oracle has more efficient data table partitioning,The amount of data in each partition is consistent [https://github.com/apache/sqoop/blob/f8beae32a067d72bf9ed6e903b041ad347ca5491/src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java#L335C51-L335C76] > Support Jdbc datasource custom data partitioning > > > Key: SPARK-48450 > URL: https://issues.apache.org/jira/browse/SPARK-48450 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: melin >Priority: Major > > "partitionColumn, lowerBound, upperBound" is not an efficient table > partitioning scheme for some databases, The amount of data in each partition > is consistent. Such as: Oracle has more efficient data table partitioning, > [https://github.com/apache/sqoop/blob/f8beae32a067d72bf9ed6e903b041ad347ca5491/src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java#L335C51-L335C76] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48450) Support Jdbc datasource custom data partitioning
[ https://issues.apache.org/jira/browse/SPARK-48450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-48450: -- Description: "partitionColumn, lowerBound, upperBound" is not an efficient table partitioning scheme for some databases, such as: Oracle has more efficient data table partitioning,The amount of data in each partition is consistent [https://github.com/apache/sqoop/blob/f8beae32a067d72bf9ed6e903b041ad347ca5491/src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java#L335C51-L335C76] was: "partitionColumn, lowerBound, upperBound" is not an efficient table partitioning scheme for some databases, such as: Oracle has more efficient data table partitioning https://github.com/apache/sqoop/blob/f8beae32a067d72bf9ed6e903b041ad347ca5491/src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java#L335C51-L335C76 > Support Jdbc datasource custom data partitioning > > > Key: SPARK-48450 > URL: https://issues.apache.org/jira/browse/SPARK-48450 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: melin >Priority: Major > > "partitionColumn, lowerBound, upperBound" is not an efficient table > partitioning scheme for some databases, such as: Oracle has more efficient > data table partitioning,The amount of data in each partition is consistent > [https://github.com/apache/sqoop/blob/f8beae32a067d72bf9ed6e903b041ad347ca5491/src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java#L335C51-L335C76] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48404) Driver and Executor support merge and run in a single jvm
melin created SPARK-48404: - Summary: Driver and Executor support merge and run in a single jvm Key: SPARK-48404 URL: https://issues.apache.org/jira/browse/SPARK-48404 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: melin Spark is used in data integration scenarios (such as reading data from mysql and writing data to other data sources), and in many cases can run tasks in a single concurrency. The Driver and Executor consume resources separately. If Driver and Executor support merging, especially when running on the cloud. Can save calculation cost. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47389) spark jdbc one insert with multiple values
melin created SPARK-47389: - Summary: spark jdbc one insert with multiple values Key: SPARK-47389 URL: https://issues.apache.org/jira/browse/SPARK-47389 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: melin Many databases support a single insert sql to write multiple rows of data. Write performance is more efficient than batch execution of multiple sql files https://github.com/apache/spark/blob/9986462811f160eacd766da8a4e14a9cbb4b8710/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L725 example: {code:java} INSERT INTO Customers (Name, Age, Active) ('Name1',21,1) INSERT INTO Customers (Name, Age, Active) ('Name2',21,1) Vs INSERT INTO Customers (Name, Age, Active) ('Name1',21,1), ('Name2',21,1) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47198) Is it possible to dynamically add backend service to ingress with Kubernetes?
[ https://issues.apache.org/jira/browse/SPARK-47198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-47198: -- Description: spark on k8s runs multiple spark apps at the same time. proxy/[sparkappid] path, forwarding to different sparkapp ui console based on sparkappid. spark apps are dynamically added and decreased. ingress Dynamically adds spark svc. [sparkappid]_svc == spark svc name [https://matthewpalmer.net/kubernetes-app-developer/articles/kubernetes-ingress-guide-nginx-example.html] [~Qin Yao] was: spark on k8s runs multiple spark apps at the same time. proxy/[sparkappid] path, forwarding to different sparkapp ui console based on sparkappid. spark apps are dynamically added and decreased. ingress Dynamically adds spark svc. [sparkappid]_svc == spark svc name [https://matthewpalmer.net/kubernetes-app-developer/articles/kubernetes-ingress-guide-nginx-example.html] > Is it possible to dynamically add backend service to ingress with Kubernetes? > - > > Key: SPARK-47198 > URL: https://issues.apache.org/jira/browse/SPARK-47198 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: melin >Priority: Major > > spark on k8s runs multiple spark apps at the same time. proxy/[sparkappid] > path, forwarding to different sparkapp ui console based on sparkappid. spark > apps are dynamically added and decreased. ingress Dynamically adds spark svc. > [sparkappid]_svc == spark svc name > [https://matthewpalmer.net/kubernetes-app-developer/articles/kubernetes-ingress-guide-nginx-example.html] > [~Qin Yao] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47198) Is it possible to dynamically add backend service to ingress with Kubernetes?
[ https://issues.apache.org/jira/browse/SPARK-47198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-47198: -- Description: spark on k8s runs multiple spark apps at the same time. proxy/[sparkappid] path, forwarding to different sparkapp ui console based on sparkappid. spark apps are dynamically added and decreased. ingress Dynamically adds spark svc. [sparkappid]_svc == spark svc name [https://matthewpalmer.net/kubernetes-app-developer/articles/kubernetes-ingress-guide-nginx-example.html] was: spark on k8s runs multiple spark apps at the same time. proxy/[sparkappid] path, forwarding to different sparkapp ui console based on sparkappid. spark apps are dynamically added and decreased. ingress Dynamically adds spark svc. sparkappid == spark svc name [https://matthewpalmer.net/kubernetes-app-developer/articles/kubernetes-ingress-guide-nginx-example.html] > Is it possible to dynamically add backend service to ingress with Kubernetes? > - > > Key: SPARK-47198 > URL: https://issues.apache.org/jira/browse/SPARK-47198 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: melin >Priority: Major > > spark on k8s runs multiple spark apps at the same time. proxy/[sparkappid] > path, forwarding to different sparkapp ui console based on sparkappid. spark > apps are dynamically added and decreased. ingress Dynamically adds spark svc. > [sparkappid]_svc == spark svc name > [https://matthewpalmer.net/kubernetes-app-developer/articles/kubernetes-ingress-guide-nginx-example.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47198) Is it possible to dynamically add backend service to ingress with Kubernetes?
melin created SPARK-47198: - Summary: Is it possible to dynamically add backend service to ingress with Kubernetes? Key: SPARK-47198 URL: https://issues.apache.org/jira/browse/SPARK-47198 Project: Spark Issue Type: New Feature Components: Kubernetes Affects Versions: 4.0.0 Reporter: melin spark on k8s runs multiple spark apps at the same time. proxy/[sparkappid] path, forwarding to different sparkapp ui console based on sparkappid. spark apps are dynamically added and decreased. ingress Dynamically adds spark svc. sparkappid == spark svc name [https://matthewpalmer.net/kubernetes-app-developer/articles/kubernetes-ingress-guide-nginx-example.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file
[ https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin resolved SPARK-47114. --- Resolution: Resolved > In the spark driver pod. Failed to access the krb5 file > --- > > Key: SPARK-47114 > URL: https://issues.apache.org/jira/browse/SPARK-47114 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.4.1 >Reporter: melin >Priority: Major > > spark runs in kubernetes and accesses an external hdfs cluster (kerberos),pod > error logs > {code:java} > Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf > loading failed{code} > This error generally occurs when the krb5 file cannot be found > [~yao] [~Qin Yao] > {code:java} > ./bin/spark-submit \ > --master k8s://https://172.18.5.44:6443 \ > --deploy-mode cluster \ > --name spark-pi \ > --class org.apache.spark.examples.SparkPi \ > --conf spark.executor.instances=1 \ > --conf spark.kubernetes.submission.waitAppCompletion=true \ > --conf spark.kubernetes.driver.pod.name=spark-xxx \ > --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \ > --conf spark.kubernetes.driver.label.profile=production \ > --conf spark.kubernetes.executor.label.profile=production \ > --conf spark.kubernetes.namespace=superior \ > --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ > --conf > spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0 > \ > --conf > spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \ > --conf spark.kubernetes.container.image.pullPolicy=Always \ > --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \ > --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ > --conf spark.kerberos.principal=superior/ad...@datacyber.com \ > --conf spark.kerberos.keytab=/root/superior.keytab \ > > file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar > 5{code} > {code:java} > (base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior > Exception in thread "main" java.lang.IllegalArgumentException: Can't get > Kerberos realm > at > org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) > at > org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395) > at > org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf > loading failed > at > java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown > Source) > at > org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120) > at > org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69) > ... 13 more > (base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior > Name: spark-xxx > Namespace: superior > Priority: 0 > Service Account: spark > Node: cdh2/172.18.5.45 > Start Time: Wed, 21 Feb 2024 15:48:08 +0800 > Labels: profile=production > spark-app-name=spark-pi > spark-app-selector=spark-728e24e49f9040fa86b04c521463020b > spark-role=driver > spark-version=3.4.2 > Annotations: > Status: Failed > IP: 10.244.1.4 > IPs: > IP: 10.244.1.4 > Containers: > spark-kubernetes-driver: > Container ID: >
[jira] [Commented] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file
[ https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821469#comment-17821469 ] melin commented on SPARK-47114: --- 默认jre17 不支持kerberos,换成jdk > In the spark driver pod. Failed to access the krb5 file > --- > > Key: SPARK-47114 > URL: https://issues.apache.org/jira/browse/SPARK-47114 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.4.1 >Reporter: melin >Priority: Major > > spark runs in kubernetes and accesses an external hdfs cluster (kerberos),pod > error logs > {code:java} > Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf > loading failed{code} > This error generally occurs when the krb5 file cannot be found > [~yao] [~Qin Yao] > {code:java} > ./bin/spark-submit \ > --master k8s://https://172.18.5.44:6443 \ > --deploy-mode cluster \ > --name spark-pi \ > --class org.apache.spark.examples.SparkPi \ > --conf spark.executor.instances=1 \ > --conf spark.kubernetes.submission.waitAppCompletion=true \ > --conf spark.kubernetes.driver.pod.name=spark-xxx \ > --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \ > --conf spark.kubernetes.driver.label.profile=production \ > --conf spark.kubernetes.executor.label.profile=production \ > --conf spark.kubernetes.namespace=superior \ > --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ > --conf > spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0 > \ > --conf > spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \ > --conf spark.kubernetes.container.image.pullPolicy=Always \ > --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \ > --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ > --conf spark.kerberos.principal=superior/ad...@datacyber.com \ > --conf spark.kerberos.keytab=/root/superior.keytab \ > > file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar > 5{code} > {code:java} > (base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior > Exception in thread "main" java.lang.IllegalArgumentException: Can't get > Kerberos realm > at > org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) > at > org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395) > at > org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf > loading failed > at > java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown > Source) > at > org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120) > at > org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69) > ... 13 more > (base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior > Name: spark-xxx > Namespace: superior > Priority: 0 > Service Account: spark > Node: cdh2/172.18.5.45 > Start Time: Wed, 21 Feb 2024 15:48:08 +0800 > Labels: profile=production > spark-app-name=spark-pi > spark-app-selector=spark-728e24e49f9040fa86b04c521463020b > spark-role=driver > spark-version=3.4.2 > Annotations: > Status: Failed > IP: 10.244.1.4 > IPs: > IP: 10.244.1.4 > Containers: > spark-kubernetes-driver: > Container ID: >
[jira] [Updated] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file
[ https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-47114: -- Description: spark runs in kubernetes and accesses an external hdfs cluster (kerberos),pod error logs {code:java} Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed{code} This error generally occurs when the krb5 file cannot be found [~yao] [~Qin Yao] {code:java} ./bin/spark-submit \ --master k8s://https://172.18.5.44:6443 \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=1 \ --conf spark.kubernetes.submission.waitAppCompletion=true \ --conf spark.kubernetes.driver.pod.name=spark-xxx \ --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \ --conf spark.kubernetes.driver.label.profile=production \ --conf spark.kubernetes.executor.label.profile=production \ --conf spark.kubernetes.namespace=superior \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0 \ --conf spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \ --conf spark.kubernetes.container.image.pullPolicy=Always \ --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \ --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ --conf spark.kerberos.principal=superior/ad...@datacyber.com \ --conf spark.kerberos.keytab=/root/superior.keytab \ file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar 5{code} {code:java} (base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior Exception in thread "main" java.lang.IllegalArgumentException: Can't get Kerberos realm at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) at org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed at java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown Source) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120) at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69) ... 13 more (base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior Name: spark-xxx Namespace: superior Priority: 0 Service Account: spark Node: cdh2/172.18.5.45 Start Time: Wed, 21 Feb 2024 15:48:08 +0800 Labels: profile=production spark-app-name=spark-pi spark-app-selector=spark-728e24e49f9040fa86b04c521463020b spark-role=driver spark-version=3.4.2 Annotations: Status: Failed IP: 10.244.1.4 IPs: IP: 10.244.1.4 Containers: spark-kubernetes-driver: Container ID: containerd://cceaf13b70cc5f21a639e71cb8663989ec73e122380844624d4bfac3946bae15 Image: spark:3.4.1 Image ID: docker.io/library/spark@sha256:69fb485a0bcad88f9a2bf066e1b5d555f818126dc9df5a0b7e6a3b6d364bc694 Ports: 7078/TCP, 7079/TCP, 4040/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP Args: driver --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi spark-internal 5 State: Terminated Reason: Error Exit Code: 1 Started: Wed, 21 Feb 2024 15:49:54 +0800 Finished: Wed, 21 Feb 2024 15:49:56
[jira] [Updated] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file
[ https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-47114: -- Description: spark runs in kubernetes and accesses an external hdfs cluster (kerberos),pod error logs {code:java} Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed{code} This error generally occurs when the krb5 file cannot be found [~yao] [~Qin Yao] {code:java} ./bin/spark-submit \ --master k8s://https://172.18.5.44:6443 \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=1 \ --conf spark.kubernetes.submission.waitAppCompletion=true \ --conf spark.kubernetes.driver.pod.name=spark-xxx \ --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \ --conf spark.kubernetes.driver.label.profile=production \ --conf spark.kubernetes.executor.label.profile=production \ --conf spark.kubernetes.namespace=superior \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0 \ --conf spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \ --conf spark.kubernetes.container.image.pullPolicy=Always \ --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \ --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ --conf spark.kerberos.principal=superior/ad...@datacyber.com \ --conf spark.kerberos.keytab=/root/superior.keytab \ file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar 5{code} {code:java} (base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior Exception in thread "main" java.lang.IllegalArgumentException: Can't get Kerberos realm at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) at org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed at java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown Source) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120) at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69) ... 13 more (base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior Name: spark-xxx Namespace: superior Priority: 0 Service Account: spark Node: cdh2/172.18.5.45 Start Time: Wed, 21 Feb 2024 15:48:08 +0800 Labels: profile=production spark-app-name=spark-pi spark-app-selector=spark-728e24e49f9040fa86b04c521463020b spark-role=driver spark-version=3.4.2 Annotations: Status: Failed IP: 10.244.1.4 IPs: IP: 10.244.1.4 Containers: spark-kubernetes-driver: Container ID: containerd://cceaf13b70cc5f21a639e71cb8663989ec73e122380844624d4bfac3946bae15 Image: spark:3.4.1 Image ID: docker.io/library/spark@sha256:69fb485a0bcad88f9a2bf066e1b5d555f818126dc9df5a0b7e6a3b6d364bc694 Ports: 7078/TCP, 7079/TCP, 4040/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP Args: driver --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi spark-internal 5 State: Terminated Reason: Error Exit Code: 1 Started: Wed, 21 Feb 2024 15:49:54 +0800 Finished: Wed, 21 Feb 2024 15:49:56
[jira] [Updated] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file
[ https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-47114: -- Description: spark runs in kubernetes and accesses an external hdfs cluster (kerberos),pod error logs {code:java} Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed{code} This error generally occurs when the krb5 file cannot be found [~yao] [~Qin Yao] {code:java} ./bin/spark-submit \ --master k8s://https://172.18.5.44:6443 \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=1 \ --conf spark.kubernetes.submission.waitAppCompletion=true \ --conf spark.kubernetes.driver.pod.name=spark-xxx \ --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \ --conf spark.kubernetes.driver.label.profile=production \ --conf spark.kubernetes.executor.label.profile=production \ --conf spark.kubernetes.namespace=superior \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0 \ --conf spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \ --conf spark.kubernetes.container.image.pullPolicy=Always \ --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \ --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ --conf spark.kerberos.principal=superior/ad...@datacyber.com \ --conf spark.kerberos.keytab=/root/superior.keytab \ --conf spark.kubernetes.driver.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/driver.yaml \ --conf spark.kubernetes.executor.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/executor.yaml \ file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar 5{code} {code:java} (base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior Exception in thread "main" java.lang.IllegalArgumentException: Can't get Kerberos realm at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) at org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed at java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown Source) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120) at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69) ... 13 more (base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior Name: spark-xxx Namespace: superior Priority: 0 Service Account: spark Node: cdh2/172.18.5.45 Start Time: Wed, 21 Feb 2024 15:48:08 +0800 Labels: profile=production spark-app-name=spark-pi spark-app-selector=spark-728e24e49f9040fa86b04c521463020b spark-role=driver spark-version=3.4.2 Annotations: Status: Failed IP: 10.244.1.4 IPs: IP: 10.244.1.4 Containers: spark-kubernetes-driver: Container ID: containerd://cceaf13b70cc5f21a639e71cb8663989ec73e122380844624d4bfac3946bae15 Image: spark:3.4.1 Image ID: docker.io/library/spark@sha256:69fb485a0bcad88f9a2bf066e1b5d555f818126dc9df5a0b7e6a3b6d364bc694 Ports: 7078/TCP, 7079/TCP, 4040/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP Args: driver --properties-file /opt/spark/conf/spark.properties --class
[jira] [Updated] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file
[ https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-47114: -- Description: spark runs in kubernetes and accesses an external hdfs cluster (kerberos),pod error logs {code:java} Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed{code} This error generally occurs when the krb5 file cannot be found [~yao] [~Qin Yao] {code:java} ./bin/spark-submit \ --master k8s://https://172.18.5.44:6443 \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=1 \ --conf spark.kubernetes.submission.waitAppCompletion=true \ --conf spark.kubernetes.driver.pod.name=spark-xxx \ --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \ --conf spark.kubernetes.driver.label.profile=production \ --conf spark.kubernetes.executor.label.profile=production \ --conf spark.kubernetes.namespace=superior \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0 \ --conf spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \ --conf spark.kubernetes.container.image.pullPolicy=Always \ --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \ --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ --conf spark.kerberos.principal=superior/ad...@datacyber.com \ --conf spark.kerberos.keytab=/root/superior.keytab \ --conf spark.kubernetes.driver.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/driver.yaml \ --conf spark.kubernetes.executor.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/executor.yaml \ file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar 5{code} {code:java} (base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior ++ id -u + myuid=0 ++ id -g + mygid=0 + set +e ++ getent passwd 0 + uidentry=root:x:0:0:root:/root:/bin/bash + set -e + '[' -z root:x:0:0:root:/root:/bin/bash ']' + '[' -z /opt/java/openjdk ']' + SPARK_CLASSPATH=':/opt/spark/jars/*' + env + grep SPARK_JAVA_OPT_ + sort -t_ -k4 -n + sed 's/[^=]*=\(.*\)/\1/g' ++ command -v readarray + '[' readarray ']' + readarray -t SPARK_EXECUTOR_JAVA_OPTS + '[' -n '' ']' + '[' -z ']' + '[' -z ']' + '[' -n '' ']' + '[' -z x ']' + SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*' + '[' -z x ']' + SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*' + case "$1" in + shift 1 + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@") + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.244.2.56 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi spark-internal 5 Exception in thread "main" java.lang.IllegalArgumentException: Can't get Kerberos realm at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) at org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed at java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown Source) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120) at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69) ... 13 more (base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior Name:
[jira] [Created] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file
melin created SPARK-47114: - Summary: In the spark driver pod. Failed to access the krb5 file Key: SPARK-47114 URL: https://issues.apache.org/jira/browse/SPARK-47114 Project: Spark Issue Type: New Feature Components: Kubernetes Affects Versions: 3.4.1 Reporter: melin spark runs in kubernetes and accesses an external hdfs cluster (kerberos) {code:java} ./bin/spark-submit \ --master k8s://https://172.18.5.44:6443 \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=1 \ --conf spark.kubernetes.submission.waitAppCompletion=true \ --conf spark.kubernetes.driver.pod.name=spark-xxx \ --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \ --conf spark.kubernetes.driver.label.profile=production \ --conf spark.kubernetes.executor.label.profile=production \ --conf spark.kubernetes.namespace=superior \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0 \ --conf spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \ --conf spark.kubernetes.container.image.pullPolicy=Always \ --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \ --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ --conf spark.kerberos.principal=superior/ad...@datacyber.com \ --conf spark.kerberos.keytab=/root/superior.keytab \ --conf spark.kubernetes.driver.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/driver.yaml \ --conf spark.kubernetes.executor.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/executor.yaml \ file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar 5{code} {code:java} (base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior ++ id -u + myuid=0 ++ id -g + mygid=0 + set +e ++ getent passwd 0 + uidentry=root:x:0:0:root:/root:/bin/bash + set -e + '[' -z root:x:0:0:root:/root:/bin/bash ']' + '[' -z /opt/java/openjdk ']' + SPARK_CLASSPATH=':/opt/spark/jars/*' + env + grep SPARK_JAVA_OPT_ + sort -t_ -k4 -n + sed 's/[^=]*=\(.*\)/\1/g' ++ command -v readarray + '[' readarray ']' + readarray -t SPARK_EXECUTOR_JAVA_OPTS + '[' -n '' ']' + '[' -z ']' + '[' -z ']' + '[' -n '' ']' + '[' -z x ']' + SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*' + '[' -z x ']' + SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*' + case "$1" in + shift 1 + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@") + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.244.2.56 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi spark-internal 5 Exception in thread "main" java.lang.IllegalArgumentException: Can't get Kerberos realm at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) at org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed at java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown Source) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120) at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69) ... 13 more (base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior Name: spark-xxx
[jira] [Created] (SPARK-46572) [SQL][Enhancement] hint enhancement
melin created SPARK-46572: - Summary: [SQL][Enhancement] hint enhancement Key: SPARK-46572 URL: https://issues.apache.org/jira/browse/SPARK-46572 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: melin https://github.com/StarRocks/starrocks/pull/37356 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value
[ https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43338: -- Description: {code:java} private[sql] object CatalogManager { val SESSION_CATALOG_NAME: String = "spark_catalog" }{code} The SESSION_CATALOG_NAME value cannot be modified。 If the platform supports hive and spark sql, the metadata catalog name is hive_metastore. It's more appropriate. The user directly copies the table name and brings the hive_metastore catalog. In this case, the default spark catalog name needs to be changed。 !image-2023-12-27-09-55-55-693.png! [~fanjia] was: {code:java} private[sql] object CatalogManager { val SESSION_CATALOG_NAME: String = "spark_catalog" }{code} The SESSION_CATALOG_NAME value cannot be modified。 If multiple Hive Metastores exist, the platform manages multiple hms metadata and classifies them by catalogName. A different catalog name is required [~fanjia] > Support modify the SESSION_CATALOG_NAME value > -- > > Key: SPARK-43338 > URL: https://issues.apache.org/jira/browse/SPARK-43338 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > Attachments: image-2023-12-27-09-55-55-693.png > > > {code:java} > private[sql] object CatalogManager { > val SESSION_CATALOG_NAME: String = "spark_catalog" > }{code} > > The SESSION_CATALOG_NAME value cannot be modified。 > If the platform supports hive and spark sql, the metadata catalog name is > hive_metastore. It's more appropriate. The user directly copies the table > name and brings the hive_metastore catalog. In this case, the default spark > catalog name needs to be changed。 > > !image-2023-12-27-09-55-55-693.png! > [~fanjia] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value
[ https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43338: -- Attachment: image-2023-12-27-09-55-55-693.png > Support modify the SESSION_CATALOG_NAME value > -- > > Key: SPARK-43338 > URL: https://issues.apache.org/jira/browse/SPARK-43338 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > Attachments: image-2023-12-27-09-55-55-693.png > > > {code:java} > private[sql] object CatalogManager { > val SESSION_CATALOG_NAME: String = "spark_catalog" > }{code} > > The SESSION_CATALOG_NAME value cannot be modified。 > If multiple Hive Metastores exist, the platform manages multiple hms metadata > and classifies them by catalogName. A different catalog name is required > [~fanjia] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46518) Support for copy from write compatible postgresql databases (pg, redshift, snowflake, gauss)
[ https://issues.apache.org/jira/browse/SPARK-46518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-46518: -- Description: Now many databases are compatible with pg syntax and support copy from syntax. The copy form import performance is 10 times higher than that of jdbc batch. [https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/CopyHelper.scala] Supports upsert data import: [https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/DataTunnelJdbcRelationProvider.scala] !image-2023-12-27-09-44-19-292.png! [~yao] was: Now many databases are compatible with pg syntax and support copy from syntax. The copy form import performance is 10 times higher than that of jdbc batch. [https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/CopyHelper.scala] Supports upsert data import: [https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/DataTunnelJdbcRelationProvider.scala] !image-2023-12-27-09-43-01-529.png! > Support for copy from write compatible postgresql databases (pg, redshift, > snowflake, gauss) > > > Key: SPARK-46518 > URL: https://issues.apache.org/jira/browse/SPARK-46518 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: melin >Priority: Major > Attachments: image-2023-12-27-09-44-19-292.png > > > Now many databases are compatible with pg syntax and support copy from > syntax. The copy form import performance is 10 times higher than that of jdbc > batch. > [https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/CopyHelper.scala] > Supports upsert data import: > [https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/DataTunnelJdbcRelationProvider.scala] > !image-2023-12-27-09-44-19-292.png! > > [~yao] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46518) Support for copy from write compatible postgresql databases (pg, redshift, snowflake, gauss)
[ https://issues.apache.org/jira/browse/SPARK-46518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-46518: -- Attachment: image-2023-12-27-09-44-19-292.png > Support for copy from write compatible postgresql databases (pg, redshift, > snowflake, gauss) > > > Key: SPARK-46518 > URL: https://issues.apache.org/jira/browse/SPARK-46518 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: melin >Priority: Major > Attachments: image-2023-12-27-09-44-19-292.png > > > Now many databases are compatible with pg syntax and support copy from > syntax. The copy form import performance is 10 times higher than that of jdbc > batch. > [https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/CopyHelper.scala] > Supports upsert data import: > [https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/DataTunnelJdbcRelationProvider.scala] > !image-2023-12-27-09-43-01-529.png! > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46518) Support for copy from write compatible postgresql databases (pg, redshift, snowflake, gauss)
melin created SPARK-46518: - Summary: Support for copy from write compatible postgresql databases (pg, redshift, snowflake, gauss) Key: SPARK-46518 URL: https://issues.apache.org/jira/browse/SPARK-46518 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: melin Now many databases are compatible with pg syntax and support copy from syntax. The copy form import performance is 10 times higher than that of jdbc batch. [https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/CopyHelper.scala] Supports upsert data import: [https://github.com/melin/datatunnel/blob/master/connectors/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/DataTunnelJdbcRelationProvider.scala] !image-2023-12-27-09-43-01-529.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46511) Optimize spark jdbc write speed with Multi-Row Inserts
[ https://issues.apache.org/jira/browse/SPARK-46511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin resolved SPARK-46511. --- Resolution: Fixed > Optimize spark jdbc write speed with Multi-Row Inserts > -- > > Key: SPARK-46511 > URL: https://issues.apache.org/jira/browse/SPARK-46511 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: melin >Priority: Major > > INSERT INTO table_name (column1, column2, column3) > VALUES (value1, value2, value3), > (value4, value5, value6), > (value7, value8, value9); -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46511) Optimize spark jdbc write speed with Multi-Row Inserts
[ https://issues.apache.org/jira/browse/SPARK-46511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-46511: -- Description: [https://brian.pontarelli.com/2011/06/21/jdbc-batch-vs-multi-row-inserts/] mysql, pg、Oracle 23c, sqlserver support: INSERT INTO table_name (column1, column2, column3) VALUES (value1, value2, value3), (value4, value5, value6), (value7, value8, value9); was: [https://brian.pontarelli.com/2011/06/21/jdbc-batch-vs-multi-row-inserts/] mysql, pg、Oracle 23c, sqlserver support: INSERT INTO table_name (column1, column2, column3) VALUES (value1, value2, value3), (value4, value5, value6), (value7, value8, value9); > Optimize spark jdbc write speed with Multi-Row Inserts > -- > > Key: SPARK-46511 > URL: https://issues.apache.org/jira/browse/SPARK-46511 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: melin >Priority: Major > > [https://brian.pontarelli.com/2011/06/21/jdbc-batch-vs-multi-row-inserts/] > mysql, pg、Oracle 23c, sqlserver support: > INSERT INTO table_name (column1, column2, column3) > VALUES (value1, value2, value3), > (value4, value5, value6), > (value7, value8, value9); > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46511) Optimize spark jdbc write speed with Multi-Row Inserts
[ https://issues.apache.org/jira/browse/SPARK-46511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-46511: -- Description: [https://brian.pontarelli.com/2011/06/21/jdbc-batch-vs-multi-row-inserts/] mysql, pg、Oracle 23c, sqlserver support: INSERT INTO table_name (column1, column2, column3) VALUES (value1, value2, value3), (value4, value5, value6), (value7, value8, value9); was: [https://brian.pontarelli.com/2011/06/21/jdbc-batch-vs-multi-row-inserts/] mysql, pg、Oracle 23c, sqlserver INSERT INTO table_name (column1, column2, column3) VALUES (value1, value2, value3), (value4, value5, value6), (value7, value8, value9); > Optimize spark jdbc write speed with Multi-Row Inserts > -- > > Key: SPARK-46511 > URL: https://issues.apache.org/jira/browse/SPARK-46511 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: melin >Priority: Major > > [https://brian.pontarelli.com/2011/06/21/jdbc-batch-vs-multi-row-inserts/] > mysql, pg、Oracle 23c, sqlserver support: > INSERT INTO table_name (column1, column2, column3) > VALUES (value1, value2, value3), > (value4, value5, value6), > (value7, value8, value9); -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46511) Optimize spark jdbc write speed with Multi-Row Inserts
melin created SPARK-46511: - Summary: Optimize spark jdbc write speed with Multi-Row Inserts Key: SPARK-46511 URL: https://issues.apache.org/jira/browse/SPARK-46511 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Environment: [https://brian.pontarelli.com/2011/06/21/jdbc-batch-vs-multi-row-inserts/] mysql, pg、Oracle 23c, sqlserver INSERT INTO table_name (column1, column2, column3) VALUES (value1, value2, value3), (value4, value5, value6), (value7, value8, value9); Reporter: melin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value
[ https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796975#comment-17796975 ] melin commented on SPARK-43338: --- [~yao] databricks support change: spark.databricks.sql.initial.catalog.name https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html > Support modify the SESSION_CATALOG_NAME value > -- > > Key: SPARK-43338 > URL: https://issues.apache.org/jira/browse/SPARK-43338 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > {code:java} > private[sql] object CatalogManager { > val SESSION_CATALOG_NAME: String = "spark_catalog" > }{code} > > The SESSION_CATALOG_NAME value cannot be modified。 > If multiple Hive Metastores exist, the platform manages multiple hms metadata > and classifies them by catalogName. A different catalog name is required > [~fanjia] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46195) Supports parse multiple sql statements
melin created SPARK-46195: - Summary: Supports parse multiple sql statements Key: SPARK-46195 URL: https://issues.apache.org/jira/browse/SPARK-46195 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: melin In the SqlBaseParser.g4 file, add the following code to support the parsing of multiple sql. select * from (select * from test), which resolves into two statements. Need to add alias {code:java} sqlStatements : singleStatement* EOF ; singleStatement : statement SEMICOLON? ; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45818) Support columnar/vectorized evaluation engine JDK17/21
melin created SPARK-45818: - Summary: Support columnar/vectorized evaluation engine JDK17/21 Key: SPARK-45818 URL: https://issues.apache.org/jira/browse/SPARK-45818 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 4.0.0 Reporter: melin Trino uses JDK columnar/vectorized evaluation engine to improve performance: [https://github.com/trinodb/trino/pull/19302] [https://github.com/trinodb/trino/issues/14237] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45140) Support ddl output json format
[ https://issues.apache.org/jira/browse/SPARK-45140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764488#comment-17764488 ] melin commented on SPARK-45140: --- We have a product that uses jdbc to collect metadata from various databases. To support hudi / paimon ddl, start spark thriftserver and collect Hive table metadata. Run the "show create table extended table_name" command to obtain table details. Parsing the text format is very inconvenient. Want to support json format > Support ddl output json format > -- > > Key: SPARK-45140 > URL: https://issues.apache.org/jira/browse/SPARK-45140 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 > Environment: > > > >Reporter: melin >Priority: Major > > hive supports ddl output json format. set hive.ddl.output.format=json; > hive table metadata is collected and output in json format for easy parsing > [~yao] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45140) Support ddl output json format
[ https://issues.apache.org/jira/browse/SPARK-45140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-45140: -- Description: hive supports ddl output json format. set hive.ddl.output.format=json; hive table metadata is collected and output in json format for easy parsing [~yao] was: hive supports ddl output json format. set hive.ddl.output.format=json; hive table metadata is collected and output in json format for easy parsing > Support ddl output json format > -- > > Key: SPARK-45140 > URL: https://issues.apache.org/jira/browse/SPARK-45140 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 > Environment: > > > >Reporter: melin >Priority: Major > > hive supports ddl output json format. set hive.ddl.output.format=json; > hive table metadata is collected and output in json format for easy parsing > [~yao] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45140) Support ddl output json format
[ https://issues.apache.org/jira/browse/SPARK-45140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-45140: -- Description: hive supports ddl output json format. set hive.ddl.output.format=json; hive table metadata is collected and output in json format for easy parsing > Support ddl output json format > -- > > Key: SPARK-45140 > URL: https://issues.apache.org/jira/browse/SPARK-45140 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 > Environment: > > > >Reporter: melin >Priority: Major > > hive supports ddl output json format. set hive.ddl.output.format=json; > hive table metadata is collected and output in json format for easy parsing > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45140) Support ddl output json format
[ https://issues.apache.org/jira/browse/SPARK-45140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-45140: -- Environment: was: hive supports ddl output json format. set hive.ddl.output.format=json; hive table metadata is collected and output in json format for easy parsing > Support ddl output json format > -- > > Key: SPARK-45140 > URL: https://issues.apache.org/jira/browse/SPARK-45140 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 > Environment: > > > >Reporter: melin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45140) Support ddl output json format
melin created SPARK-45140: - Summary: Support ddl output json format Key: SPARK-45140 URL: https://issues.apache.org/jira/browse/SPARK-45140 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Environment: hive supports ddl output json format. set hive.ddl.output.format=json; hive table metadata is collected and output in json format for easy parsing Reporter: melin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value
[ https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743701#comment-17743701 ] melin commented on SPARK-43338: --- [~yao] Would consider setting a custom catalog name? example: spark.sql.session.catalog.default.name=spark_catalog > Support modify the SESSION_CATALOG_NAME value > -- > > Key: SPARK-43338 > URL: https://issues.apache.org/jira/browse/SPARK-43338 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > {code:java} > private[sql] object CatalogManager { > val SESSION_CATALOG_NAME: String = "spark_catalog" > }{code} > > The SESSION_CATALOG_NAME value cannot be modified。 > If multiple Hive Metastores exist, the platform manages multiple hms metadata > and classifies them by catalogName. A different catalog name is required > [~fanjia] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44015) Support SIMILAR TO operator
melin created SPARK-44015: - Summary: Support SIMILAR TO operator Key: SPARK-44015 URL: https://issues.apache.org/jira/browse/SPARK-44015 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.5.0 Reporter: melin https://www.w3resource.com/PostgreSQL/postgresql-similar-operator.php -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44014) Support BETWEEN SYMMETRIC operator
melin created SPARK-44014: - Summary: Support BETWEEN SYMMETRIC operator Key: SPARK-44014 URL: https://issues.apache.org/jira/browse/SPARK-44014 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.5.0 Reporter: melin https://andreigridnev.com/blog/2016-03-20-between-symmetric-operator-in-postgresql/ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43776) [BUG] MySQL jdbc cursor has not taken effect
[ https://issues.apache.org/jira/browse/SPARK-43776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727226#comment-17727226 ] melin commented on SPARK-43776: --- mysql jdbc stream read data, need to set stmt.setFetchSize(Integer.MIN_VALUE)}, the default read data is bufffer mode, easy to cause oom。 JDBCRDD reads data in buffer mode,JdbcRDD is set by default to read data in streaming mode. These two RDDS are inconsistent [~srowen] > [BUG] MySQL jdbc cursor has not taken effect > > > Key: SPARK-43776 > URL: https://issues.apache.org/jira/browse/SPARK-43776 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: melin >Priority: Major > > JDBCRDD.scala: stmt.setFetchSize(options.fetchSize) > > JdbcRDD.scala: > {code:java} > if (url.startsWith("jdbc:mysql:")){ > // setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force > // streaming results, rather than pulling entire resultset into memory. > // See the below URL > // > dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html > stmt.setFetchSize(Integer.MIN_VALUE) } > else > { stmt.setFetchSize(100) } > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43805) Support SELECT * EXCEPT AND SELECT * REPLACE
[ https://issues.apache.org/jira/browse/SPARK-43805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43805: -- Description: ref: [https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except] https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_replace [~fanjia] was: ref: [https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except] [~fanjia] > Support SELECT * EXCEPT AND SELECT * REPLACE > - > > Key: SPARK-43805 > URL: https://issues.apache.org/jira/browse/SPARK-43805 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > ref: > [https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except] > https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_replace > [~fanjia] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43805) Support SELECT * EXCEPT AND SELECT * REPLACE
[ https://issues.apache.org/jira/browse/SPARK-43805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43805: -- Description: ref: [https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except] [~fanjia] was: ref: [https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except] > Support SELECT * EXCEPT AND SELECT * REPLACE > - > > Key: SPARK-43805 > URL: https://issues.apache.org/jira/browse/SPARK-43805 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > ref: > [https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except] > [~fanjia] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43805) Support SELECT * EXCEPT AND SELECT * REPLACE
melin created SPARK-43805: - Summary: Support SELECT * EXCEPT AND SELECT * REPLACE Key: SPARK-43805 URL: https://issues.apache.org/jira/browse/SPARK-43805 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.5.0 Reporter: melin ref: [https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43776) [BUG] MySQL jdbc cursor has not taken effect
[ https://issues.apache.org/jira/browse/SPARK-43776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43776: -- Description: JDBCRDD.scala: stmt.setFetchSize(options.fetchSize) JdbcRDD.scala: {code:java} if (url.startsWith("jdbc:mysql:")){ // setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force // streaming results, rather than pulling entire resultset into memory. // See the below URL // dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html stmt.setFetchSize(Integer.MIN_VALUE) } else { stmt.setFetchSize(100) } {code} was: JDBCRDD.scala: stmt.setFetchSize(options.fetchSize) JdbcRDD.scala: {code:java} if (url.startsWith("jdbc:mysql:")) { // setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force // streaming results, rather than pulling entire resultset into memory. // See the below URL // dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html stmt.setFetchSize(Integer.MIN_VALUE) } else { stmt.setFetchSize(100) } {code} > [BUG] MySQL jdbc cursor has not taken effect > > > Key: SPARK-43776 > URL: https://issues.apache.org/jira/browse/SPARK-43776 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: melin >Priority: Major > > JDBCRDD.scala: stmt.setFetchSize(options.fetchSize) > > JdbcRDD.scala: > {code:java} > if (url.startsWith("jdbc:mysql:")){ > // setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force > // streaming results, rather than pulling entire resultset into memory. > // See the below URL > // > dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html > stmt.setFetchSize(Integer.MIN_VALUE) } > else > { stmt.setFetchSize(100) } > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43776) [BUG] MySQL jdbc cursor has not taken effect
melin created SPARK-43776: - Summary: [BUG] MySQL jdbc cursor has not taken effect Key: SPARK-43776 URL: https://issues.apache.org/jira/browse/SPARK-43776 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.4.1, 3.5.0 Reporter: melin JDBCRDD.scala: stmt.setFetchSize(options.fetchSize) JdbcRDD.scala: ``` if (url.startsWith("jdbc:mysql:")) { // setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force // streaming results, rather than pulling entire resultset into memory. // See the below URL // dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html stmt.setFetchSize(Integer.MIN_VALUE) } else { stmt.setFetchSize(100) } ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43776) [BUG] MySQL jdbc cursor has not taken effect
[ https://issues.apache.org/jira/browse/SPARK-43776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43776: -- Description: JDBCRDD.scala: stmt.setFetchSize(options.fetchSize) JdbcRDD.scala: {code:java} if (url.startsWith("jdbc:mysql:")) { // setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force // streaming results, rather than pulling entire resultset into memory. // See the below URL // dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html stmt.setFetchSize(Integer.MIN_VALUE) } else { stmt.setFetchSize(100) } {code} was: JDBCRDD.scala: stmt.setFetchSize(options.fetchSize) JdbcRDD.scala: ``` if (url.startsWith("jdbc:mysql:")) { // setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force // streaming results, rather than pulling entire resultset into memory. // See the below URL // dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html stmt.setFetchSize(Integer.MIN_VALUE) } else { stmt.setFetchSize(100) } ``` > [BUG] MySQL jdbc cursor has not taken effect > > > Key: SPARK-43776 > URL: https://issues.apache.org/jira/browse/SPARK-43776 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: melin >Priority: Major > > JDBCRDD.scala: stmt.setFetchSize(options.fetchSize) > > JdbcRDD.scala: > {code:java} > if (url.startsWith("jdbc:mysql:")) > { // setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force > // streaming results, rather than pulling entire resultset into memory. // > See the below URL // > dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html > stmt.setFetchSize(Integer.MIN_VALUE) } > else > { stmt.setFetchSize(100) } > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43748) Support DISTINCT ON
melin created SPARK-43748: - Summary: Support DISTINCT ON Key: SPARK-43748 URL: https://issues.apache.org/jira/browse/SPARK-43748 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.5.0 Reporter: melin ref: https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-select-distinct/ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value
[ https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724942#comment-17724942 ] melin commented on SPARK-43338: --- kyuubi verified it: [https://kyuubi.readthedocs.io/en/v1.7.1-rc0/connector/spark/hive.html] kyuubi is implemented based on HiveSessionCatalog. If there are huid tables in the hive database, another Hudi catalog needs to be registered. The same hms has two catalognames, which does not meet my requirements. > Support modify the SESSION_CATALOG_NAME value > -- > > Key: SPARK-43338 > URL: https://issues.apache.org/jira/browse/SPARK-43338 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > {code:java} > private[sql] object CatalogManager { > val SESSION_CATALOG_NAME: String = "spark_catalog" > }{code} > > The SESSION_CATALOG_NAME value cannot be modified。 > If multiple Hive Metastores exist, the platform manages multiple hms metadata > and classifies them by catalogName. A different catalog name is required > [~fanjia] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value
[ https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724933#comment-17724933 ] melin commented on SPARK-43338: --- I don't need to access multiple hms in the same sparksession, I only need to access one of them. Assign each hms a unique catalogname only so that the meta tableId is unique: catalog.database.table. > Support modify the SESSION_CATALOG_NAME value > -- > > Key: SPARK-43338 > URL: https://issues.apache.org/jira/browse/SPARK-43338 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > {code:java} > private[sql] object CatalogManager { > val SESSION_CATALOG_NAME: String = "spark_catalog" > }{code} > > The SESSION_CATALOG_NAME value cannot be modified。 > If multiple Hive Metastores exist, the platform manages multiple hms metadata > and classifies them by catalogName. A different catalog name is required > [~fanjia] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37351) Supports write data flow control
[ https://issues.apache.org/jira/browse/SPARK-37351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-37351: -- Description: The hive table data is written to a relational database, generally an online production database. If the writing speed has no traffic control, it can easily affect the stability of the online system. It is recommended to add traffic control parameters [~fanjia] was:The hive table data is written to a relational database, generally an online production database. If the writing speed has no traffic control, it can easily affect the stability of the online system. It is recommended to add traffic control parameters > Supports write data flow control > > > Key: SPARK-37351 > URL: https://issues.apache.org/jira/browse/SPARK-37351 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: melin >Priority: Major > > The hive table data is written to a relational database, generally an online > production database. If the writing speed has no traffic control, it can > easily affect the stability of the online system. It is recommended to add > traffic control parameters > [~fanjia] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value
[ https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43338: -- Description: {code:java} private[sql] object CatalogManager { val SESSION_CATALOG_NAME: String = "spark_catalog" }{code} The SESSION_CATALOG_NAME value cannot be modified。 If multiple Hive Metastores exist, the platform manages multiple hms metadata and classifies them by catalogName. A different catalog name is required [~fanjia] was: {code:java} private[sql] object CatalogManager { val SESSION_CATALOG_NAME: String = "spark_catalog" }{code} The SESSION_CATALOG_NAME value cannot be modified。 If multiple Hive Metastores exist, the platform manages multiple hms metadata and classifies them by catalogName. A different catalog name is required [~yao] > Support modify the SESSION_CATALOG_NAME value > -- > > Key: SPARK-43338 > URL: https://issues.apache.org/jira/browse/SPARK-43338 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > {code:java} > private[sql] object CatalogManager { > val SESSION_CATALOG_NAME: String = "spark_catalog" > }{code} > > The SESSION_CATALOG_NAME value cannot be modified。 > If multiple Hive Metastores exist, the platform manages multiple hms metadata > and classifies them by catalogName. A different catalog name is required > [~fanjia] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43521) Support CREATE TABLE LIKE FILE for PARQUET
melin created SPARK-43521: - Summary: Support CREATE TABLE LIKE FILE for PARQUET Key: SPARK-43521 URL: https://issues.apache.org/jira/browse/SPARK-43521 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: melin ref: https://issues.apache.org/jira/browse/HIVE-26395 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43400) Add Primary Key syntax support
[ https://issues.apache.org/jira/browse/SPARK-43400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43400: -- Summary: Add Primary Key syntax support (was: create table support the PRIMARY KEY keyword) > Add Primary Key syntax support > -- > > Key: SPARK-43400 > URL: https://issues.apache.org/jira/browse/SPARK-43400 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > apache paimon and hudi support primary key definitions. It is necessary to > support the primary key definition syntax > https://docs.snowflake.com/en/sql-reference/sql/create-table-constraint#constraint-properties > [~gurwls223] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value
[ https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720831#comment-17720831 ] melin edited comment on SPARK-43338 at 5/9/23 7:38 AM: --- If the same hive database has parquet and hudi table, does HiveTableCatalog support access to hudi table? not want to register two catalog was (Author: melin): If the same hive database has parquet and hudi table, does HiveTableCatalog support access to hudi table? > Support modify the SESSION_CATALOG_NAME value > -- > > Key: SPARK-43338 > URL: https://issues.apache.org/jira/browse/SPARK-43338 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > {code:java} > private[sql] object CatalogManager { > val SESSION_CATALOG_NAME: String = "spark_catalog" > }{code} > > The SESSION_CATALOG_NAME value cannot be modified。 > If multiple Hive Metastores exist, the platform manages multiple hms metadata > and classifies them by catalogName. A different catalog name is required > [~yao] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value
[ https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720831#comment-17720831 ] melin commented on SPARK-43338: --- If the same hive database has parquet and hudi table, does HiveTableCatalog support access to hudi table? > Support modify the SESSION_CATALOG_NAME value > -- > > Key: SPARK-43338 > URL: https://issues.apache.org/jira/browse/SPARK-43338 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > {code:java} > private[sql] object CatalogManager { > val SESSION_CATALOG_NAME: String = "spark_catalog" > }{code} > > The SESSION_CATALOG_NAME value cannot be modified。 > If multiple Hive Metastores exist, the platform manages multiple hms metadata > and classifies them by catalogName. A different catalog name is required > [~yao] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value
[ https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720794#comment-17720794 ] melin commented on SPARK-43338: --- You understand a big feature. Only one hms is accessed in a sparksession. I just want spark_catalog to be modified. For example, if you have two hadoop clusters, there should be two hms. Metadata management platform (similar to databricks unity catalog), the acquisition of the HMS metadata, in order to distinguish the uniqueness, need to add catalogName (tableid: catalogName. SchemaName. TableName). When spark accesses hive tables, it is consistent with the catalogname of tableid instead of spark_catalog。 > Support modify the SESSION_CATALOG_NAME value > -- > > Key: SPARK-43338 > URL: https://issues.apache.org/jira/browse/SPARK-43338 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > {code:java} > private[sql] object CatalogManager { > val SESSION_CATALOG_NAME: String = "spark_catalog" > }{code} > > The SESSION_CATALOG_NAME value cannot be modified。 > If multiple Hive Metastores exist, the platform manages multiple hms metadata > and classifies them by catalogName. A different catalog name is required > [~yao] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43400) create table support the PRIMARY KEY keyword
[ https://issues.apache.org/jira/browse/SPARK-43400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43400: -- Description: apache paimon and hudi support primary key definitions. It is necessary to support the primary key definition syntax https://docs.snowflake.com/en/sql-reference/sql/create-table-constraint#constraint-properties [~gurwls223] was: apache paimon and hudi support primary key definitions. It is necessary to support the primary key definition syntax [~gurwls223] > create table support the PRIMARY KEY keyword > > > Key: SPARK-43400 > URL: https://issues.apache.org/jira/browse/SPARK-43400 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > apache paimon and hudi support primary key definitions. It is necessary to > support the primary key definition syntax > https://docs.snowflake.com/en/sql-reference/sql/create-table-constraint#constraint-properties > [~gurwls223] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43400) create table support the PRIMARY KEY keyword
[ https://issues.apache.org/jira/browse/SPARK-43400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43400: -- Description: apache paimon and hudi support primary key definitions. It is necessary to support the primary key definition syntax [~gurwls223] was:apache paimon and hudi support primary key definitions. It is necessary to support the primary key definition syntax > create table support the PRIMARY KEY keyword > > > Key: SPARK-43400 > URL: https://issues.apache.org/jira/browse/SPARK-43400 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > apache paimon and hudi support primary key definitions. It is necessary to > support the primary key definition syntax > [~gurwls223] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43400) create table support the PRIMARY KEY keyword
melin created SPARK-43400: - Summary: create table support the PRIMARY KEY keyword Key: SPARK-43400 URL: https://issues.apache.org/jira/browse/SPARK-43400 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: melin apache paimon and hudi support primary key definitions. It is necessary to support the primary key definition syntax -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported
[ https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720110#comment-17720110 ] melin edited comment on SPARK-43382 at 5/7/23 8:35 AM: --- There is an idea to customize the hadoop filesystem based on common vfs. The common vfs supports reading different archive files. simple demo: [https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/] {code:java} spark.read.option("header", "true") .csv("vfs://tgz:ftp://fcftp:fcftp@172.18.1.52/csv.tar.gz!/csv;).show() spark.read.option("header", "true") .csv("vfs://tgz:s3://BxiljVd5YZa3mRUn:3Mq9dsmdMbN1JipE1TlOF7OuDkuYBYpe@cdh1:9300/demo-bucket/csv.tar.gz!/csv").show() spark.read.option("header", "true") .csv("vfs://tgz:sftp:///test:test2023@172.18.5.46:22/ftpdata/csv.tar.gz!/csv;).show() {code} was (Author: melin): There is an idea to customize the hadoop filesystem based on common vfs. The common vfs supports reading different archive files. simple demo: [https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/|https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/SparkMaskDemo.scala] {code:java} spark.read.option("header", "true") .csv("vfs://tgz:ftp://fcftp:fcftp@172.18.1.52/csv.tar.gz!/csv;).show() spark.read.option("header", "true") .csv("vfs://tgz:s3://BxiljVd5YZa3mRUn:3Mq9dsmdMbN1JipE1TlOF7OuDkuYBYpe@cdh1:9300/demo-bucket/csv.tar.gz!/csv").show() spark.read.option("header", "true") .csv("vfs://tgz:sftp:///test:test2023@172.18.5.46:22/ftpdata/csv.tar.gz!/csv;).show() {code} > Read and write csv and json files. Archive files such as zip or gz are > supported > > > Key: SPARK-43382 > URL: https://issues.apache.org/jira/browse/SPARK-43382 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > snowflake data import and export, support fixed files. For example: > > {code:java} > COPY INTO @mystage/data.csv.gz > > COPY INTO mytable > FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; > FILE_FORMAT = (TYPE = 'JSON') > MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; > > {code} > Can spark directly read archive files? > {code:java} > spark.read.csv("/tutorials/dataloading/sales.json.gz") > {code} > @[~kaifeiYi] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported
[ https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720110#comment-17720110 ] melin edited comment on SPARK-43382 at 5/7/23 8:34 AM: --- There is an idea to customize the hadoop filesystem based on common vfs. The common vfs supports reading different archive files. simple demo: [https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/|https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/SparkMaskDemo.scala] {code:java} spark.read.option("header", "true") .csv("vfs://tgz:ftp://fcftp:fcftp@172.18.1.52/csv.tar.gz!/csv;).show() spark.read.option("header", "true") .csv("vfs://tgz:s3://BxiljVd5YZa3mRUn:3Mq9dsmdMbN1JipE1TlOF7OuDkuYBYpe@cdh1:9300/demo-bucket/csv.tar.gz!/csv").show() spark.read.option("header", "true") .csv("vfs://tgz:sftp:///test:test2023@172.18.5.46:22/ftpdata/csv.tar.gz!/csv;).show() {code} was (Author: melin): There is an idea to customize the hadoop filesystem based on common vfs. The common vfs supports reading different archive files. simple demo: [https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/SparkMaskDemo.scala] spark.read.option("header", "true") .csv("vfs://tgz:ftp://fcftp:fcftp@172.18.1.52/csv.tar.gz!/csv;).show() > Read and write csv and json files. Archive files such as zip or gz are > supported > > > Key: SPARK-43382 > URL: https://issues.apache.org/jira/browse/SPARK-43382 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > snowflake data import and export, support fixed files. For example: > > {code:java} > COPY INTO @mystage/data.csv.gz > > COPY INTO mytable > FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; > FILE_FORMAT = (TYPE = 'JSON') > MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; > > {code} > Can spark directly read archive files? > {code:java} > spark.read.csv("/tutorials/dataloading/sales.json.gz") > {code} > @[~kaifeiYi] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported
[ https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43382: -- Description: snowflake data import and export, support fixed files. For example: {code:java} COPY INTO @mystage/data.csv.gz COPY INTO mytable FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; FILE_FORMAT = (TYPE = 'JSON') MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; {code} Can spark directly read archive files? {code:java} spark.read.csv("/tutorials/dataloading/sales.json.gz") {code} @[~kaifeiYi] was: snowflake data import and export, support fixed files. For example: {code:java} COPY INTO @mystage/data.csv.gz COPY INTO mytable FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; FILE_FORMAT = (TYPE = 'JSON') MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; {code} Can spark directly read archive files? {code:java} spark.read.csv("/tutorials/dataloading/sales.json.gz"){code} @[~kaifeiYi] > Read and write csv and json files. Archive files such as zip or gz are > supported > > > Key: SPARK-43382 > URL: https://issues.apache.org/jira/browse/SPARK-43382 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > snowflake data import and export, support fixed files. For example: > > {code:java} > COPY INTO @mystage/data.csv.gz > > COPY INTO mytable > FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; > FILE_FORMAT = (TYPE = 'JSON') > MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; > > {code} > Can spark directly read archive files? > {code:java} > spark.read.csv("/tutorials/dataloading/sales.json.gz") > {code} > @[~kaifeiYi] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported
[ https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720110#comment-17720110 ] melin edited comment on SPARK-43382 at 5/6/23 3:57 PM: --- There is an idea to customize the hadoop filesystem based on common vfs. The common vfs supports reading different archive files. simple demo: [https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/SparkMaskDemo.scala] spark.read.option("header", "true") .csv("vfs://tgz:ftp://fcftp:fcftp@172.18.1.52/csv.tar.gz!/csv;).show() was (Author: melin): There is an idea to customize the hadoop filesystem based on common vfs. The common vfs supports reading different archive files. simple demo: [https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/SparkMaskDemo.scala] > Read and write csv and json files. Archive files such as zip or gz are > supported > > > Key: SPARK-43382 > URL: https://issues.apache.org/jira/browse/SPARK-43382 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > snowflake data import and export, support fixed files. For example: > > {code:java} > COPY INTO @mystage/data.csv.gz > > COPY INTO mytable > FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; > FILE_FORMAT = (TYPE = 'JSON') > MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; > > {code} > Can spark directly read archive files? > {code:java} > spark.read.csv("/tutorials/dataloading/sales.json.gz"){code} > @[~kaifeiYi] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported
[ https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720110#comment-17720110 ] melin edited comment on SPARK-43382 at 5/6/23 3:57 PM: --- There is an idea to customize the hadoop filesystem based on common vfs. The common vfs supports reading different archive files. simple demo: [https://github.com/melin/spark-jobserver/blob/master/jobserver-extensions/src/test/scala/com/github/melin/jobserver/extensions/sql/SparkMaskDemo.scala] was (Author: melin): There is an idea to customize the hadoop filesystem based on common vfs. The common vfs supports reading different archive files. > Read and write csv and json files. Archive files such as zip or gz are > supported > > > Key: SPARK-43382 > URL: https://issues.apache.org/jira/browse/SPARK-43382 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > snowflake data import and export, support fixed files. For example: > > {code:java} > COPY INTO @mystage/data.csv.gz > > COPY INTO mytable > FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; > FILE_FORMAT = (TYPE = 'JSON') > MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; > > {code} > Can spark directly read archive files? > {code:java} > spark.read.csv("/tutorials/dataloading/sales.json.gz"){code} > @[~kaifeiYi] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported
[ https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720110#comment-17720110 ] melin commented on SPARK-43382: --- There is an idea to customize the hadoop filesystem based on common vfs. The common vfs supports reading different archive files. > Read and write csv and json files. Archive files such as zip or gz are > supported > > > Key: SPARK-43382 > URL: https://issues.apache.org/jira/browse/SPARK-43382 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > snowflake data import and export, support fixed files. For example: > > {code:java} > COPY INTO @mystage/data.csv.gz > > COPY INTO mytable > FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; > FILE_FORMAT = (TYPE = 'JSON') > MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; > > {code} > Can spark directly read archive files? > {code:java} > spark.read.csv("/tutorials/dataloading/sales.json.gz"){code} > @[~kaifeiYi] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported
[ https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43382: -- Description: snowflake data import and export, support fixed files. For example: {code:java} COPY INTO @mystage/data.csv.gz COPY INTO mytable FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; FILE_FORMAT = (TYPE = 'JSON') MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; {code} Can spark directly read archive files? {code:java} spark.read.csv("/tutorials/dataloading/sales.json.gz"){code} @[~kaifeiYi] was: snowflake data import and export, support fixed files. For example: {code:java} COPY INTO @mystage/data.csv.gz COPY INTO mytable FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; FILE_FORMAT = (TYPE = 'JSON') MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; {code} Can spark directly read archive files? {code:java} spark.read.csv("/tutorials/dataloading/sales.json.gz"){code} > Read and write csv and json files. Archive files such as zip or gz are > supported > > > Key: SPARK-43382 > URL: https://issues.apache.org/jira/browse/SPARK-43382 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > snowflake data import and export, support fixed files. For example: > > {code:java} > COPY INTO @mystage/data.csv.gz > > COPY INTO mytable > FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; > FILE_FORMAT = (TYPE = 'JSON') > MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; > > {code} > Can spark directly read archive files? > {code:java} > spark.read.csv("/tutorials/dataloading/sales.json.gz"){code} > @[~kaifeiYi] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported
[ https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43382: -- Description: snowflake data import and export, support fixed files. For example: {code:java} COPY INTO @mystage/data.csv.gz COPY INTO mytable FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; FILE_FORMAT = (TYPE = 'JSON') MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; {code} Can spark directly read archive files? {code:java} spark.read.csv("/tutorials/dataloading/sales.json.gz"){code} > Read and write csv and json files. Archive files such as zip or gz are > supported > > > Key: SPARK-43382 > URL: https://issues.apache.org/jira/browse/SPARK-43382 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > snowflake data import and export, support fixed files. For example: > > {code:java} > COPY INTO @mystage/data.csv.gz > > COPY INTO mytable > FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; > FILE_FORMAT = (TYPE = 'JSON') > MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; > > {code} > Can spark directly read archive files? > {code:java} > spark.read.csv("/tutorials/dataloading/sales.json.gz"){code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported
[ https://issues.apache.org/jira/browse/SPARK-43382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43382: -- Environment: (was: snowflake data import and export, support fixed files. For example: {code:java} COPY INTO @mystage/data.csv.gz COPY INTO mytable FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; FILE_FORMAT = (TYPE = 'JSON') MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; {code} Can spark directly read archive files? {code:java} spark.read.csv("/tutorials/dataloading/sales.json.gz"){code} ) > Read and write csv and json files. Archive files such as zip or gz are > supported > > > Key: SPARK-43382 > URL: https://issues.apache.org/jira/browse/SPARK-43382 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43382) Read and write csv and json files. Archive files such as zip or gz are supported
melin created SPARK-43382: - Summary: Read and write csv and json files. Archive files such as zip or gz are supported Key: SPARK-43382 URL: https://issues.apache.org/jira/browse/SPARK-43382 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Environment: snowflake data import and export, support fixed files. For example: {code:java} COPY INTO @mystage/data.csv.gz COPY INTO mytable FROM @my_ext_stage/tutorials/dataloading/sales.json.gz; FILE_FORMAT = (TYPE = 'JSON') MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE'; {code} Can spark directly read archive files? {code:java} spark.read.csv("/tutorials/dataloading/sales.json.gz"){code} Reporter: melin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value
[ https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43338: -- Description: {code:java} private[sql] object CatalogManager { val SESSION_CATALOG_NAME: String = "spark_catalog" }{code} The SESSION_CATALOG_NAME value cannot be modified。 If multiple Hive Metastores exist, the platform manages multiple hms metadata and classifies them by catalogName. A different catalog name is required [~yao] was: {code:java} private[sql] object CatalogManager { val SESSION_CATALOG_NAME: String = "spark_catalog" }{code} The SESSION_CATALOG_NAME value cannot be modified。 If multiple Hive Metastores exist, the platform manages multiple hms metadata and classifies them by catalogName. A different catalog name is required [~gurwls223] > Support modify the SESSION_CATALOG_NAME value > -- > > Key: SPARK-43338 > URL: https://issues.apache.org/jira/browse/SPARK-43338 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > {code:java} > private[sql] object CatalogManager { > val SESSION_CATALOG_NAME: String = "spark_catalog" > }{code} > > The SESSION_CATALOG_NAME value cannot be modified。 > If multiple Hive Metastores exist, the platform manages multiple hms metadata > and classifies them by catalogName. A different catalog name is required > [~yao] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value
[ https://issues.apache.org/jira/browse/SPARK-43338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43338: -- Description: {code:java} private[sql] object CatalogManager { val SESSION_CATALOG_NAME: String = "spark_catalog" }{code} The SESSION_CATALOG_NAME value cannot be modified。 If multiple Hive Metastores exist, the platform manages multiple hms metadata and classifies them by catalogName. A different catalog name is required [~gurwls223] was: {code:java} private[sql] object CatalogManager { val SESSION_CATALOG_NAME: String = "spark_catalog" }{code} The SESSION_CATALOG_NAME value cannot be modified。 If multiple Hive Metastores exist, the platform manages multiple hms metadata and classifies them by catalogName. A different catalog name is required > Support modify the SESSION_CATALOG_NAME value > -- > > Key: SPARK-43338 > URL: https://issues.apache.org/jira/browse/SPARK-43338 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > {code:java} > private[sql] object CatalogManager { > val SESSION_CATALOG_NAME: String = "spark_catalog" > }{code} > > The SESSION_CATALOG_NAME value cannot be modified。 > If multiple Hive Metastores exist, the platform manages multiple hms metadata > and classifies them by catalogName. A different catalog name is required > > [~gurwls223] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43338) Support modify the SESSION_CATALOG_NAME value
melin created SPARK-43338: - Summary: Support modify the SESSION_CATALOG_NAME value Key: SPARK-43338 URL: https://issues.apache.org/jira/browse/SPARK-43338 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: melin {code:java} private[sql] object CatalogManager { val SESSION_CATALOG_NAME: String = "spark_catalog" }{code} The SESSION_CATALOG_NAME value cannot be modified。 If multiple Hive Metastores exist, the platform manages multiple hms metadata and classifies them by catalogName. A different catalog name is required -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43318) spark reader csv and json support wholetext parameters
melin created SPARK-43318: - Summary: spark reader csv and json support wholetext parameters Key: SPARK-43318 URL: https://issues.apache.org/jira/browse/SPARK-43318 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: melin Fix For: 3.5.0 FTPInputStream used by Hadoop FTPFileSystem does not support seek, and spark HadoopFileLinesReader fails to be read. Support to read the entire file, and then split lines, avoid reading failure [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPInputStream.java] [~cloud_fan] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43101) Add CREATE/DROP catalog
[ https://issues.apache.org/jira/browse/SPARK-43101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43101: -- Description: Convenient registration of the catalog, in sts ref: [https://github.com/trinodb/trino/issues/12709] was: Convenient registration of the catalog, in sts ref: https://github.com/trinodb/trino/pull/13931 > Add CREATE/DROP catalog > > > Key: SPARK-43101 > URL: https://issues.apache.org/jira/browse/SPARK-43101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > Convenient registration of the catalog, in sts > ref: [https://github.com/trinodb/trino/issues/12709] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43101) Dynamic Catalogs
[ https://issues.apache.org/jira/browse/SPARK-43101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43101: -- Summary: Dynamic Catalogs (was: Add CREATE/DROP catalog ) > Dynamic Catalogs > > > Key: SPARK-43101 > URL: https://issues.apache.org/jira/browse/SPARK-43101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > Convenient registration of the catalog, in sts > ref: [https://github.com/trinodb/trino/issues/12709] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43101) Add CREATE/DROP catalog
[ https://issues.apache.org/jira/browse/SPARK-43101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-43101: -- Description: Convenient registration of the catalog, in sts ref: [https://github.com/trinodb/trino/issues/12709] was: Convenient registration of the catalog, in sts ref: [https://github.com/trinodb/trino/issues/12709] > Add CREATE/DROP catalog > > > Key: SPARK-43101 > URL: https://issues.apache.org/jira/browse/SPARK-43101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: melin >Priority: Major > > Convenient registration of the catalog, in sts > ref: [https://github.com/trinodb/trino/issues/12709] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43101) Add CREATE/DROP catalog
melin created SPARK-43101: - Summary: Add CREATE/DROP catalog Key: SPARK-43101 URL: https://issues.apache.org/jira/browse/SPARK-43101 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: melin Convenient registration of the catalog, in sts ref: https://github.com/trinodb/trino/pull/13931 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709588#comment-17709588 ] melin commented on SPARK-38200: --- [~beliefer] MERGE INTO is a standard sql: https://en.wikipedia.org/wiki/Merge_%28SQL%29, mysql doesn't implement it, most databases do > [SQL] Spark JDBC Savemode Supports Upsert > - > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > upsert sql for different databases, Most databases support merge sql: > sqlserver merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] > mysql: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] > oracle merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] > postgres: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] > postgres merg into sql : > [https://www.postgresql.org/docs/current/sql-merge.html] > db2 merge into sql : > [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] > derby merge into sql: > [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] > he merg into sql : > [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] > > [~yao] > > https://github.com/melin/datatunnel/tree/master/plugins/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/dialect > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43060) [SQL] Spark JDBC rate limitation
melin created SPARK-43060: - Summary: [SQL] Spark JDBC rate limitation Key: SPARK-43060 URL: https://issues.apache.org/jira/browse/SPARK-43060 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.5.0 Reporter: melin spark jdbc directly reads and writes data into the database, which may affect database stability. Therefore, a speed limit parameter is required. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-38200: -- Description: upsert sql for different databases, Most databases support merge sql: sqlserver merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] mysql: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] oracle merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] postgres: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] postgres merg into sql : [https://www.postgresql.org/docs/current/sql-merge.html] db2 merge into sql : [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] derby merge into sql: [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] he merg into sql : [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] [~yao] https://github.com/melin/datatunnel/tree/master/plugins/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/dialect was: upsert sql for different databases, Most databases support merge sql: sqlserver merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] mysql: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] oracle merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] postgres: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] postgres merg into sql : [https://www.postgresql.org/docs/current/sql-merge.html] db2 merge into sql : [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] derby merge into sql: [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] he merg into sql : [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] [~yao] > [SQL] Spark JDBC Savemode Supports Upsert > - > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > upsert sql for different databases, Most databases support merge sql: > sqlserver merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] > mysql: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] > oracle merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] > postgres: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] > postgres merg into sql : > [https://www.postgresql.org/docs/current/sql-merge.html] > db2 merge into sql : > [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] > derby merge into sql: > [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] > he merg into sql : > [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] > > [~yao] > > https://github.com/melin/datatunnel/tree/master/plugins/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/dialect > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail:
[jira] [Updated] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-38200: -- Description: upsert sql for different databases, Most databases support merge sql: sqlserver merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] mysql: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] oracle merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] postgres: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] postgres merg into sql : [https://www.postgresql.org/docs/current/sql-merge.html] db2 merge into sql : [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] derby merge into sql: [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] he merg into sql : [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] [~yao] was: upsert sql for different databases, Most databases support merge sql: sqlserver merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] mysql: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] oracle merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] postgres: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] postgres merg into sql : [https://www.postgresql.org/docs/current/sql-merge.html] db2 merge into sql : [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] derby merge into sql: [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] he merg into sql : [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] [~maxgekk] > [SQL] Spark JDBC Savemode Supports Upsert > - > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > upsert sql for different databases, Most databases support merge sql: > sqlserver merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] > mysql: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] > oracle merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] > postgres: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] > postgres merg into sql : > [https://www.postgresql.org/docs/current/sql-merge.html] > db2 merge into sql : > [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] > derby merge into sql: > [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] > he merg into sql : > [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] > > [~yao] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42627) Spark: Getting SQLException: Unsupported type -102 reading from Oracle
[ https://issues.apache.org/jira/browse/SPARK-42627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-42627: -- Description: {code:java} Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL type -102 at org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) {code} oracle driver {code:java} com.oracle.database.jdbc ojdbc8 21.9.0.0 {code} oracle sql: {code:java} CREATE TABLE "ORDERS" ( "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, "PRICE" NUMBER(10,5) NOT NULL ENABLE, "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, PRIMARY KEY ("ORDER_ID") USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "LOGMINER_TBS" ENABLE, SUPPLEMENTAL LOG DATA (ALL) COLUMNS ) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "LOGMINER_TBS" {code} [~yao] was: {code:java} Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL type -102 at org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) {code} oracle driver {code:java} com.oracle.database.jdbc ojdbc8 21.9.0.0 {code} oracle sql: {code:java} CREATE TABLE "ORDERS" ( "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, "PRICE" NUMBER(10,5) NOT NULL ENABLE, "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, PRIMARY KEY ("ORDER_ID") USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
[jira] [Updated] (SPARK-42627) Spark: Getting SQLException: Unsupported type -102 reading from Oracle
[ https://issues.apache.org/jira/browse/SPARK-42627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-42627: -- Description: {code:java} Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL type -102 at org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) {code} oracle driver {code:java} com.oracle.database.jdbc ojdbc8 21.9.0.0 {code} oracle sql: {code:java} CREATE TABLE "ORDERS" ( "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, "PRICE" NUMBER(10,5) NOT NULL ENABLE, "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, PRIMARY KEY ("ORDER_ID") USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "LOGMINER_TBS" ENABLE, SUPPLEMENTAL LOG DATA (ALL) COLUMNS ) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "LOGMINER_TBS" {code} [~maxgekk] was: {code:java} Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL type -102 at org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) {code} oracle driver {code:java} com.oracle.database.jdbc ojdbc8 21.9.0.0 {code} oracle sql: {code:java} CREATE TABLE "ORDERS" ( "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, "PRICE" NUMBER(10,5) NOT NULL ENABLE, "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, PRIMARY KEY ("ORDER_ID") USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS
[jira] [Comment Edited] (SPARK-42627) Spark: Getting SQLException: Unsupported type -102 reading from Oracle
[ https://issues.apache.org/jira/browse/SPARK-42627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17707619#comment-17707619 ] melin edited comment on SPARK-42627 at 4/2/23 6:50 AM: --- not support type: TIMESTAMP (3) WITH LOCAL TIME ZONE [~srowen] was (Author: melin): not support type: TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE [~srowen] > Spark: Getting SQLException: Unsupported type -102 reading from Oracle > -- > > Key: SPARK-42627 > URL: https://issues.apache.org/jira/browse/SPARK-42627 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2 >Reporter: melin >Priority: Major > > > {code:java} > Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized > SQL type -102 > at > org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) > > {code} > oracle driver > {code:java} > > com.oracle.database.jdbc > ojdbc8 > 21.9.0.0 > {code} > > oracle sql: > > {code:java} > CREATE TABLE "ORDERS" > ( "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, > "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, > "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, > "PRICE" NUMBER(10,5) NOT NULL ENABLE, > "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, > "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, > PRIMARY KEY ("ORDER_ID") > USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS > STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 > PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE > DEFAULT CELL_FLASH_CACHE DEFAULT) > TABLESPACE "LOGMINER_TBS" ENABLE, > SUPPLEMENTAL LOG DATA (ALL) COLUMNS > ) SEGMENT CREATION IMMEDIATE > PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING > STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 > PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE > DEFAULT CELL_FLASH_CACHE DEFAULT) > TABLESPACE "LOGMINER_TBS" > > {code} > [~beliefer] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42627) Spark: Getting SQLException: Unsupported type -102 reading from Oracle
[ https://issues.apache.org/jira/browse/SPARK-42627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17707619#comment-17707619 ] melin commented on SPARK-42627: --- not support type: TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE [~srowen] > Spark: Getting SQLException: Unsupported type -102 reading from Oracle > -- > > Key: SPARK-42627 > URL: https://issues.apache.org/jira/browse/SPARK-42627 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2 >Reporter: melin >Priority: Major > > > {code:java} > Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized > SQL type -102 > at > org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) > > {code} > oracle driver > {code:java} > > com.oracle.database.jdbc > ojdbc8 > 21.9.0.0 > {code} > > oracle sql: > > {code:java} > CREATE TABLE "ORDERS" > ( "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, > "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, > "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, > "PRICE" NUMBER(10,5) NOT NULL ENABLE, > "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, > "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, > PRIMARY KEY ("ORDER_ID") > USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS > STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 > PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE > DEFAULT CELL_FLASH_CACHE DEFAULT) > TABLESPACE "LOGMINER_TBS" ENABLE, > SUPPLEMENTAL LOG DATA (ALL) COLUMNS > ) SEGMENT CREATION IMMEDIATE > PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING > STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 > PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE > DEFAULT CELL_FLASH_CACHE DEFAULT) > TABLESPACE "LOGMINER_TBS" > > {code} > [~beliefer] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-38200: -- Description: upsert sql for different databases, Most databases support merge sql: sqlserver merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] mysql: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] oracle merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] postgres: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] postgres merg into sql : [https://www.postgresql.org/docs/current/sql-merge.html] db2 merge into sql : [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] derby merge into sql: [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] he merg into sql : [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] [~maxgekk] was: upsert sql for different databases, Most databases support merge sql: sqlserver merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] mysql: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] oracle merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] postgres: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] postgres merg into sql : [https://www.postgresql.org/docs/current/sql-merge.html] db2 merge into sql : [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] derby merge into sql: [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] he merg into sql : [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] [~beliefer] [~cloud_fan] > [SQL] Spark JDBC Savemode Supports Upsert > - > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > upsert sql for different databases, Most databases support merge sql: > sqlserver merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] > mysql: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] > oracle merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] > postgres: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] > postgres merg into sql : > [https://www.postgresql.org/docs/current/sql-merge.html] > db2 merge into sql : > [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] > derby merge into sql: > [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] > he merg into sql : > [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] > > [~maxgekk] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-38200: -- Description: upsert sql for different databases, Most databases support merge sql: sqlserver merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] mysql: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] oracle merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] postgres: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] postgres merg into sql : [https://www.postgresql.org/docs/current/sql-merge.html] db2 merge into sql : [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] derby merge into sql: [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] he merg into sql : [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] [~beliefer] [~cloud_fan] was: When writing data into a relational database, data duplication needs to be considered. Both mysql and postgres support upsert syntax. mysql: {code:java} replace into t(id, update_time) values(1, now()); {code} pg: {code:java} INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT (id,name) DO UPDATE SET id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark {code} > [SQL] Spark JDBC Savemode Supports Upsert > - > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > upsert sql for different databases, Most databases support merge sql: > sqlserver merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] > mysql: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] > oracle merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] > postgres: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] > postgres merg into sql : > [https://www.postgresql.org/docs/current/sql-merge.html] > db2 merge into sql : > [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] > derby merge into sql: > [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] > he merg into sql : > [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] > [~beliefer] [~cloud_fan] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert
[ https://issues.apache.org/jira/browse/SPARK-38200 ] melin deleted comment on SPARK-38200: --- was (Author: melin): upsert sql for different databases, Most databases support merge sql: sqlserver merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] mysql: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] oracle merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] postgres: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] postgres merg into sql : [https://www.postgresql.org/docs/current/sql-merge.html] db2 merge into sql : [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] derby merge into sql: [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] he merg into sql : [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] [~beliefer] [~cloud_fan] > [SQL] Spark JDBC Savemode Supports Upsert > - > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > When writing data into a relational database, data duplication needs to be > considered. Both mysql and postgres support upsert syntax. > mysql: > {code:java} > replace into t(id, update_time) values(1, now()); {code} > pg: > {code:java} > INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT > (id,name) DO UPDATE SET > id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-38200: -- Summary: [SQL] Spark JDBC Savemode Supports Upsert (was: [SQL] Spark JDBC Savemode Supports replace) > [SQL] Spark JDBC Savemode Supports Upsert > - > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > When writing data into a relational database, data duplication needs to be > considered. Both mysql and postgres support upsert syntax. > mysql: > {code:java} > replace into t(id, update_time) values(1, now()); {code} > pg: > {code:java} > INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT > (id,name) DO UPDATE SET > id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-38200) [SQL] Spark JDBC Savemode Supports replace
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705765#comment-17705765 ] melin edited comment on SPARK-38200 at 3/28/23 3:00 AM: upsert sql for different databases, Most databases support merge sql: sqlserver merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] mysql: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] oracle merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] postgres: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] postgres merg into sql : [https://www.postgresql.org/docs/current/sql-merge.html] db2 merge into sql : [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] derby merge into sql: [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] he merg into sql : [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] [~beliefer] [~cloud_fan] was (Author: melin): upsert sql for different databases, Most databases support merge sql: sqlserver merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] mysql: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] oracle merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] postgres: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] db2 merge into sql : [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] derby merge into sql: [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] he merg into sql : [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] [~beliefer] [~cloud_fan] > [SQL] Spark JDBC Savemode Supports replace > -- > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > When writing data into a relational database, data duplication needs to be > considered. Both mysql and postgres support upsert syntax. > mysql: > {code:java} > replace into t(id, update_time) values(1, now()); {code} > pg: > {code:java} > INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT > (id,name) DO UPDATE SET > id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports replace
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705765#comment-17705765 ] melin commented on SPARK-38200: --- upsert sql for different databases, Most databases support merge sql: sqlserver merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] mysql: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] oracle merge into sql : [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] postgres: [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] db2 merge into sql : [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] derby merge into sql: [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] he merg into sql : [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] [~beliefer] [~cloud_fan] > [SQL] Spark JDBC Savemode Supports replace > -- > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > When writing data into a relational database, data duplication needs to be > considered. Both mysql and postgres support upsert syntax. > mysql: > {code:java} > replace into t(id, update_time) values(1, now()); {code} > pg: > {code:java} > INSERT INTO %s (id,name,data_time,remark) VALUES ( ?,?,?,? ) ON CONFLICT > (id,name) DO UPDATE SET > id=excluded.id,name=excluded.name,data_time=excluded.data_time,remark=excluded.remark > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42627) Spark: Getting SQLException: Unsupported type -102 reading from Oracle
[ https://issues.apache.org/jira/browse/SPARK-42627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-42627: -- Description: {code:java} Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL type -102 at org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) {code} oracle driver {code:java} com.oracle.database.jdbc ojdbc8 21.9.0.0 {code} oracle sql: {code:java} CREATE TABLE "ORDERS" ( "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, "PRICE" NUMBER(10,5) NOT NULL ENABLE, "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, PRIMARY KEY ("ORDER_ID") USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "LOGMINER_TBS" ENABLE, SUPPLEMENTAL LOG DATA (ALL) COLUMNS ) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "LOGMINER_TBS" {code} [~beliefer] was: {code:java} Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL type -102 at org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) {code} oracle sql: {code:java} CREATE TABLE "ORDERS" ( "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, "PRICE" NUMBER(10,5) NOT NULL ENABLE, "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, PRIMARY KEY ("ORDER_ID") USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE
[jira] [Updated] (SPARK-42627) Spark: Getting SQLException: Unsupported type -102 reading from Oracle
[ https://issues.apache.org/jira/browse/SPARK-42627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-42627: -- Description: {code:java} Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL type -102 at org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) {code} oracle sql: {code:java} CREATE TABLE "ORDERS" ( "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, "PRICE" NUMBER(10,5) NOT NULL ENABLE, "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, PRIMARY KEY ("ORDER_ID") USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "LOGMINER_TBS" ENABLE, SUPPLEMENTAL LOG DATA (ALL) COLUMNS ) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "LOGMINER_TBS" {code} [~beliefer] was: {code:java} Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL type -102 at org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) {code} oracle sql: {code:java} CREATE TABLE "ORDERS" ( "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, "PRICE" NUMBER(10,5) NOT NULL ENABLE, "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, PRIMARY KEY ("ORDER_ID") USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "LOGMINER_TBS" ENABLE, SUPPLEMENTAL LOG DATA (ALL) COLUMNS ) SEGMENT CREATION IMMEDIATE
[jira] [Updated] (SPARK-42627) Spark: Getting SQLException: Unsupported type -102 reading from Oracle
[ https://issues.apache.org/jira/browse/SPARK-42627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-42627: -- Description: {code:java} Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL type -102 at org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) {code} oracle sql: {code:java} CREATE TABLE "ORDERS" ( "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, "PRICE" NUMBER(10,5) NOT NULL ENABLE, "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, PRIMARY KEY ("ORDER_ID") USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "LOGMINER_TBS" ENABLE, SUPPLEMENTAL LOG DATA (ALL) COLUMNS ) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "LOGMINER_TBS" {code} was: ``` Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL type -102 at org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) ``` oracle sql: ```sql CREATE TABLE "ORDERS" ( "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, "PRICE" NUMBER(10,5) NOT NULL ENABLE, "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, PRIMARY KEY ("ORDER_ID") USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "LOGMINER_TBS" ENABLE, SUPPLEMENTAL LOG DATA (ALL) COLUMNS ) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40
[jira] [Created] (SPARK-42627) Spark: Getting SQLException: Unsupported type -102 reading from Oracle
melin created SPARK-42627: - Summary: Spark: Getting SQLException: Unsupported type -102 reading from Oracle Key: SPARK-42627 URL: https://issues.apache.org/jira/browse/SPARK-42627 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.2 Reporter: melin ``` Exception in thread "main" org.apache.spark.SparkSQLException: Unrecognized SQL type -102 at org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:832) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:225) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$getSchema$1(JdbcUtils.scala:308) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:308) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:70) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) ``` oracle sql: ```sql CREATE TABLE "ORDERS" ( "ORDER_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_DATE" TIMESTAMP (3) WITH LOCAL TIME ZONE NOT NULL ENABLE, "CUSTOMER_NAME" VARCHAR2(255) NOT NULL ENABLE, "PRICE" NUMBER(10,5) NOT NULL ENABLE, "PRODUCT_ID" NUMBER(9,0) NOT NULL ENABLE, "ORDER_STATUS" NUMBER(1,0) NOT NULL ENABLE, PRIMARY KEY ("ORDER_ID") USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "LOGMINER_TBS" ENABLE, SUPPLEMENTAL LOG DATA (ALL) COLUMNS ) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "LOGMINER_TBS" ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40568) Spark Streaming support Debezium
melin created SPARK-40568: - Summary: Spark Streaming support Debezium Key: SPARK-40568 URL: https://issues.apache.org/jira/browse/SPARK-40568 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.4.0 Reporter: melin Debezuim is a very popular CDC technology. Spark Structured Streaming supports Debezuim, which facilitates data writing to data lakes。 The most commonly used scheme is FLink CDC,Hope Spark can support it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40189) Support json_array_get/json_array_length function
[ https://issues.apache.org/jira/browse/SPARK-40189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583473#comment-17583473 ] melin commented on SPARK-40189: --- [~maxgekk] > Support json_array_get/json_array_length function > - > > Key: SPARK-40189 > URL: https://issues.apache.org/jira/browse/SPARK-40189 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: melin >Priority: Major > > presto provides these two functions,frequently used > https://prestodb.io/docs/current/functions/json.html#json-functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40190) Support json_array_get and json_array_length function
[ https://issues.apache.org/jira/browse/SPARK-40190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin resolved SPARK-40190. --- Resolution: Duplicate > Support json_array_get and json_array_length function > - > > Key: SPARK-40190 > URL: https://issues.apache.org/jira/browse/SPARK-40190 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: melin >Priority: Major > > presto provides these two functions, which are often used: > https://prestodb.io/docs/current/functions/json.html#json-functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40190) Support json_array_get and json_array_length function
melin created SPARK-40190: - Summary: Support json_array_get and json_array_length function Key: SPARK-40190 URL: https://issues.apache.org/jira/browse/SPARK-40190 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: melin presto provides these two functions, which are often used: https://prestodb.io/docs/current/functions/json.html#json-functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40189) Support json_array_get/json_array_length function
melin created SPARK-40189: - Summary: Support json_array_get/json_array_length function Key: SPARK-40189 URL: https://issues.apache.org/jira/browse/SPARK-40189 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: melin presto provides these two functions,frequently used https://prestodb.io/docs/current/functions/json.html#json-functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40184) Support modify the comment of a partitioned column
melin created SPARK-40184: - Summary: Support modify the comment of a partitioned column Key: SPARK-40184 URL: https://issues.apache.org/jira/browse/SPARK-40184 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: melin Comment is not added to the partition field when the table is created. Can modify the partition field Comment -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40118) InMemoryFIleIndex caches filelists, how to solve the problem that multiple sparksessions run for a long time and filelists is out of sync
melin created SPARK-40118: - Summary: InMemoryFIleIndex caches filelists, how to solve the problem that multiple sparksessions run for a long time and filelists is out of sync Key: SPARK-40118 URL: https://issues.apache.org/jira/browse/SPARK-40118 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: melin For example, two Sparksessions A and B, query table T1 in A, write data to table T1 in B, and fail to query the data written by B in A. There are currently two approaches: 1. Close SparkSession A and restart it 2. Invoke Refresh table Command. These two practices are not feasible for business users who do not know when to operate. Frequent refresh affects the interaction performance. Ideally, it would support a centralized caching scheme such as RedIS, providing an extended interface that allows you to customize the Cache -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39990) Restrict special characters in field name, which can be controlled by switches
[ https://issues.apache.org/jira/browse/SPARK-39990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-39990: -- Description: The hive metastore restricts field name to only contain alphanumerics and underscores. If the custom catalog does not use hms, Custom metadata system based on iceberg。these restrictions may not exist, such as reading excel data, writing iceberg table. hack way forbidden: {code:java} @Around("execution(public * org.apache.spark.sql.execution.datasources.DataSourceUtils.checkFieldNames(..))") public void checkFieldNames_1(ProceedingJoinPoint pjp) throws Throwable { LOG.info("skip checkFieldNames 1"); } @Around("execution(public * org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkFieldNames(..))") public void checkFieldNames_2(ProceedingJoinPoint pjp) throws Throwable { LOG.info("skip checkFieldNames 2"); } @Around("execution(public * org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkFieldName(..))") public void checkFieldNames_3(ProceedingJoinPoint pjp) throws Throwable { LOG.info("skip checkFieldNames 3"); }{code} CREATE OR REPLACE TABLE huaixin_rp.bigdata.parquet_orders_rp5 USING ICEBERG select 12 as id, 'ceity' as `address(地 址)` [~hyukjin.kwon] was: The hive metastore restricts field name to only contain alphanumerics and underscores. If the custom catalog does not use hms, Custom metadata system based on iceberg。these restrictions may not exist, such as reading excel data, writing iceberg table, and column names are prone to special characters such as spaces, parentheses, etc hack way forbidden: {code:java} @Around("execution(public * org.apache.spark.sql.execution.datasources.DataSourceUtils.checkFieldNames(..))") public void checkFieldNames_1(ProceedingJoinPoint pjp) throws Throwable { LOG.info("skip checkFieldNames 1"); } @Around("execution(public * org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkFieldNames(..))") public void checkFieldNames_2(ProceedingJoinPoint pjp) throws Throwable { LOG.info("skip checkFieldNames 2"); } @Around("execution(public * org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkFieldName(..))") public void checkFieldNames_3(ProceedingJoinPoint pjp) throws Throwable { LOG.info("skip checkFieldNames 3"); }{code} CREATE OR REPLACE TABLE huaixin_rp.bigdata.parquet_orders_rp5 USING PARQUET select 12 as id, 'ceity' as `address(地 址)` [~hyukjin.kwon] > Restrict special characters in field name, which can be controlled by > switches > --- > > Key: SPARK-39990 > URL: https://issues.apache.org/jira/browse/SPARK-39990 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: melin >Priority: Major > > The hive metastore restricts field name to only contain alphanumerics and > underscores. If the custom catalog does not use hms, Custom metadata system > based on iceberg。these restrictions may not exist, such as reading excel > data, writing iceberg table. > hack way forbidden: > {code:java} > @Around("execution(public * > org.apache.spark.sql.execution.datasources.DataSourceUtils.checkFieldNames(..))") > public void checkFieldNames_1(ProceedingJoinPoint pjp) throws Throwable { > LOG.info("skip checkFieldNames 1"); > } > @Around("execution(public * > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkFieldNames(..))") > public void checkFieldNames_2(ProceedingJoinPoint pjp) throws Throwable { > LOG.info("skip checkFieldNames 2"); > } > @Around("execution(public * > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkFieldName(..))") > public void checkFieldNames_3(ProceedingJoinPoint pjp) throws Throwable > { LOG.info("skip checkFieldNames 3"); }{code} > CREATE OR REPLACE TABLE huaixin_rp.bigdata.parquet_orders_rp5 USING ICEBERG > select 12 as id, 'ceity' as `address(地 址)` > [~hyukjin.kwon] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org