[jira] [Commented] (SPARK-38536) Spark 3 can not read mixed format partitions
[ https://issues.apache.org/jira/browse/SPARK-38536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506728#comment-17506728 ] Deegue commented on SPARK-38536: I wonder if it's caused by the different Hadoop versions your Orc and Parquet based on. Can you check it? [~songhuicheng] > Spark 3 can not read mixed format partitions > > > Key: SPARK-38536 > URL: https://issues.apache.org/jira/browse/SPARK-38536 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.2.1 >Reporter: Huicheng Song >Priority: Major > > Spark 3.x reads partitions with table's input format, which fails when the > partition has a different input format than the table. > This is a regression introduced by SPARK-26630. Before that fix, Spark will > use Partition InputFormat when creating HadoopRDD. With that fix, Spark uses > only Table InputFormat when creating HadoopRDD, causing failures > Reading mixed format partitions is an import scenario, especially for format > migration. It is also well supported in query engines like Hive and Presto. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38536) Spark 3 can not read mixed format partitions
[ https://issues.apache.org/jira/browse/SPARK-38536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506678#comment-17506678 ] Deegue commented on SPARK-38536: Thanks [~hyukjin.kwon], [~songhuicheng] I don't think this pr would change how we read a table through its ifc. Can you describe more about this issue? like the exception or related code etc. > Spark 3 can not read mixed format partitions > > > Key: SPARK-38536 > URL: https://issues.apache.org/jira/browse/SPARK-38536 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.2.1 >Reporter: Huicheng Song >Priority: Major > > Spark 3.x reads partitions with table's input format, which fails when the > partition has a different input format than the table. > This is a regression introduced by SPARK-26630. Before that fix, Spark will > use Partition InputFormat when creating HadoopRDD. With that fix, Spark uses > only Table InputFormat when creating HadoopRDD, causing failures > Reading mixed format partitions is an import scenario, especially for format > migration. It is also well supported in query engines like Hive and Presto. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29910) Add minimum runtime limit to speculation
Deegue created SPARK-29910: -- Summary: Add minimum runtime limit to speculation Key: SPARK-29910 URL: https://issues.apache.org/jira/browse/SPARK-29910 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Deegue The minimum runtime to speculation used to be a fixed value 100ms. It means tasks finished in seconds will also be speculated and more executors will be required. To resolve this, we add `spark.speculation.minRuntime` to control the minimum runtime limit for speculation. We can reduce normal tasks to be speculated by adjusting `spark.speculation.minRuntime`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29786) Fix MetaException when dropping a partition not exists on HDFS.
Deegue created SPARK-29786: -- Summary: Fix MetaException when dropping a partition not exists on HDFS. Key: SPARK-29786 URL: https://issues.apache.org/jira/browse/SPARK-29786 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Deegue When we drop a partition which doesn't exist on HDFS, we will receive `MetaException`. But actually the partition has been dropped. In Hive, no exception will thrown in this case. For example: If we execute alter table test.tmp drop partition(stat_day=20190516); (the partition stat_day=20190516 exists on Hive meta, but doesn't exist on HDFS) We will get : {code:java} Error: Error running query: MetaException(message:File does not exist: /user/hive/warehouse/test.db/tmp/stat_day=20190516 at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getContentSummary(FSDirectory.java:2414) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getContentSummary(FSNamesystem.java:4719) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getContentSummary(NameNodeRpcServer.java:1237) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getContentSummary(AuthorizationProviderProxyClientProtocol.java:568) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getContentSummary(ClientNamenodeProtocolServerSideTranslatorPB.java:896) at org.apache.hadoop.hdfs. protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2278) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2274) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2274) ) (state=,code=0) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29785) Optimize opening a new session of Spark Thrift Server
Deegue created SPARK-29785: -- Summary: Optimize opening a new session of Spark Thrift Server Key: SPARK-29785 URL: https://issues.apache.org/jira/browse/SPARK-29785 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Deegue When we open a new session of Spark Thrift Server, `use default` is called and a free executor is needed to execute the SQL. This behavior adds ~5 seconds to opening a new session which should only cost ~100ms. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28239) Allow TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-28239: --- Summary: Allow TCP connections created by shuffle service auto close on YARN NodeManagers (was: Make TCP connections created by shuffle service auto close on YARN NodeManagers) > Allow TCP connections created by shuffle service auto close on YARN > NodeManagers > > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > Configs: > spark.shuffle.service.enabled=true >Reporter: Deegue >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > When executing shuffle tasks, TCP connections(on port 7337 by default) will > be established by shuffle service. > It will like: > !screenshot-1.png! > However, some of the TCP connections are still busy when the task is actually > finished. These connections won't close automatically until we restart the > NodeManager process. > Connections pile up and NodeManagers are getting slower and slower. > !screenshot-2.png! > These unclosed TCP connections stay busy and it seem doesn't take effect when > I set ChannelOption.SO_KEEPALIVE to true according to > [SPARK-23182|https://github.com/apache/spark/pull/20512]. > So the solution is setting ChannelOption.AUTO_CLOSE to true, and after which > our cluster(running 1+ jobs / day) is processing normally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28239) Make TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-28239: --- Description: When executing shuffle tasks, TCP connections(on port 7337 by default) will be established by shuffle service. It will like: !screenshot-1.png! However, some of the TCP connections are still busy when the task is actually finished. These connections won't close automatically until we restart the NodeManager process. Connections pile up and NodeManagers are getting slower and slower. !screenshot-2.png! These unclosed TCP connections stay busy and it seem doesn't take effect when I set ChannelOption.SO_KEEPALIVE to true according to [SPARK-23182|https://github.com/apache/spark/pull/20512]. So the solution is setting ChannelOption.AUTO_CLOSE to true, and after which our cluster(running 1+ jobs / day) is processing normally. was: When executing shuffle tasks, TCP connections(on port 7337 by default) will be established by shuffle service. It will like: !screenshot-1.png! However, some of the TCP connections are still busy when the task is actually finished. These connections won't close automatically until we restart the NodeManager process. Connections pile up and NodeManagers are getting slower and slower. !screenshot-2.png! These unclosed TCP connections stay busy and it seem doesn't take effect when I set ChannelOption.SO_KEEPALIVE to true according to [SPARK-23182|https://github.com/apache/spark/pull/20512]. So the solution is setting ChannelOption.AUTO_CLOSE to true, and after which our cluster(running 1+ jobs / day) is processing > Make TCP connections created by shuffle service auto close on YARN > NodeManagers > --- > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > Configs: > spark.shuffle.service.enabled=true >Reporter: Deegue >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > When executing shuffle tasks, TCP connections(on port 7337 by default) will > be established by shuffle service. > It will like: > !screenshot-1.png! > However, some of the TCP connections are still busy when the task is actually > finished. These connections won't close automatically until we restart the > NodeManager process. > Connections pile up and NodeManagers are getting slower and slower. > !screenshot-2.png! > These unclosed TCP connections stay busy and it seem doesn't take effect when > I set ChannelOption.SO_KEEPALIVE to true according to > [SPARK-23182|https://github.com/apache/spark/pull/20512]. > So the solution is setting ChannelOption.AUTO_CLOSE to true, and after which > our cluster(running 1+ jobs / day) is processing normally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28239) Make TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-28239: --- Description: When executing shuffle tasks, TCP connections(on port 7337 by default) will be established by shuffle service. It will like: !screenshot-1.png! However, some of the TCP connections are still busy when the task is actually finished. These connections won't close automatically until we restart the NodeManager process. Connections pile up and NodeManagers are getting slower and slower. !screenshot-2.png! These unclosed TCP connections stay busy and it seem doesn't take effect when I set ChannelOption.SO_KEEPALIVE to true according to [SPARK-23182|https://github.com/apache/spark/pull/20512]. So the solution is setting ChannelOption.AUTO_CLOSE to true, and after which our cluster(running 1+ jobs / day) is processing was: When executing shuffle tasks, TCP connections(on port 7337 by default) will be established by shuffle service. It will like: !screenshot-1.png! However, some of the TCP connections are still busy when the task is actually finished. These connections won't close automatically until we restart the NodeManager process. Connections pile up and NodeManagers are getting slower and slower. !screenshot-2.png! These unclosed TCP connections stay busy and it seem doesn't take effect when I set ChannelOption.SO_KEEPALIVE to true according to [SPARK-23182|https://github.com/apache/spark/pull/20512]. > Make TCP connections created by shuffle service auto close on YARN > NodeManagers > --- > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > Configs: > spark.shuffle.service.enabled=true >Reporter: Deegue >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > When executing shuffle tasks, TCP connections(on port 7337 by default) will > be established by shuffle service. > It will like: > !screenshot-1.png! > However, some of the TCP connections are still busy when the task is actually > finished. These connections won't close automatically until we restart the > NodeManager process. > Connections pile up and NodeManagers are getting slower and slower. > !screenshot-2.png! > These unclosed TCP connections stay busy and it seem doesn't take effect when > I set ChannelOption.SO_KEEPALIVE to true according to > [SPARK-23182|https://github.com/apache/spark/pull/20512]. > So the solution is setting ChannelOption.AUTO_CLOSE to true, and after which > our cluster(running 1+ jobs / day) is processing -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28239) Make TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-28239: --- Description: When executing shuffle tasks, TCP connections(on port 7337 by default) will be established by shuffle service. It will like: !screenshot-1.png! However, some of the TCP connections are still busy when the task is actually finished. These connections won't close automatically until we restart the NodeManager process. Connections pile up and NodeManagers are getting slower and slower. !screenshot-2.png! These unclosed TCP connections stay busy and it seem doesn't take effect when I set ChannelOption.SO_KEEPALIVE to true according to [SPARK-23182|https://github.com/apache/spark/pull/20512]. was: When executing shuffle tasks, TCP connections(on port 7337 by default) will be established by shuffle service. It will like: !screenshot-1.png! However, some of the TCP connections are still busy when the task is actually finished. These connections won't close automatically until we restart the NodeManager process. Connections pile up and NodeManagers are getting slower and slower. !screenshot-2.png! These unclosed TCP connections stay busy and it seem doesn't take effect when I set ChannelOption.SO_KEEPALIVE to true according to > Make TCP connections created by shuffle service auto close on YARN > NodeManagers > --- > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > Configs: > spark.shuffle.service.enabled=true >Reporter: Deegue >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > When executing shuffle tasks, TCP connections(on port 7337 by default) will > be established by shuffle service. > It will like: > !screenshot-1.png! > However, some of the TCP connections are still busy when the task is actually > finished. These connections won't close automatically until we restart the > NodeManager process. > Connections pile up and NodeManagers are getting slower and slower. > !screenshot-2.png! > These unclosed TCP connections stay busy and it seem doesn't take effect when > I set ChannelOption.SO_KEEPALIVE to true according to > [SPARK-23182|https://github.com/apache/spark/pull/20512]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28239) Make TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-28239: --- Description: When executing shuffle tasks, TCP connections(on port 7337 by default) will be established by shuffle service. It will like: !screenshot-1.png! However, some of the TCP connections are still busy when the task is actually finished. These connections won't close automatically until we restart the NodeManager process. Connections pile up and NodeManagers are getting slower and slower. !screenshot-2.png! These unclosed TCP connections stay busy and it seem doesn't take effect when I set ChannelOption.SO_KEEPALIVE to true according to was: When executing shuffle tasks, TCP connections(on port 7337 by default) will be established by shuffle service. It will like: !screenshot-1.png! However, some of the TCP connections are still busy when the task is actually finished. These connections won't close automatically until we restart the NodeManager process. Connections pile up and NodeManagers are getting slower and slower. !screenshot-2.png! > Make TCP connections created by shuffle service auto close on YARN > NodeManagers > --- > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > Configs: > spark.shuffle.service.enabled=true >Reporter: Deegue >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > When executing shuffle tasks, TCP connections(on port 7337 by default) will > be established by shuffle service. > It will like: > !screenshot-1.png! > However, some of the TCP connections are still busy when the task is actually > finished. These connections won't close automatically until we restart the > NodeManager process. > Connections pile up and NodeManagers are getting slower and slower. > !screenshot-2.png! > These unclosed TCP connections stay busy and it seem doesn't take effect when > I set ChannelOption.SO_KEEPALIVE to true according to -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28239) Make TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-28239: --- Description: When executing shuffle tasks, TCP connections(on port 7337 by default) will be established by shuffle service. It will like: !screenshot-1.png! However, some of the TCP connections are still busy when the task is actually finished. These connections won't close automatically until we restart the NodeManager process. Connections pile up and NodeManagers are getting slower and slower. !screenshot-2.png! was: When executing shuffle tasks, TCP connections(on port 7337 by default) will be established by shuffle service. It will like: !screenshot-1.png! However, some of the TCP connections are still busy when the task is actually finished. These connections won't close automatically until we restart the NodeManager process. Connections pile up and NodeManagers are getting slower and slower. !screenshot-2.png! > Make TCP connections created by shuffle service auto close on YARN > NodeManagers > --- > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > Configs: > spark.shuffle.service.enabled=true >Reporter: Deegue >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > When executing shuffle tasks, TCP connections(on port 7337 by default) will > be established by shuffle service. > It will like: > !screenshot-1.png! > However, some of the TCP connections are still busy when the task is actually > finished. These connections won't close automatically until we restart the > NodeManager process. > Connections pile up and NodeManagers are getting slower and slower. > !screenshot-2.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28239) Make TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-28239: --- Environment: Hadoop2.6.0-CDH5.8.3(netty3) Spark2.4.0(netty4) Configs: spark.shuffle.service.enabled=true was: Hadoop2.6.0-CDH5.8.3(netty3) Spark2.4.0(netty4) set spark.shuffle.service.enabled=true > Make TCP connections created by shuffle service auto close on YARN > NodeManagers > --- > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > Configs: > spark.shuffle.service.enabled=true >Reporter: Deegue >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > When executing shuffle tasks, TCP connections(on port 7337 by default) will > be established by shuffle service. > It will like: > !screenshot-1.png! > However, some of the TCP connections are still busy when the task is actually > finished. These connections won't close automatically until we restart the > NodeManager process. > Connections pile up and NodeManagers are getting slower and slower. > !screenshot-2.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28239) Make TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-28239: --- Description: When executing shuffle tasks, TCP connections(on port 7337 by default) will be established by shuffle service. It will like: !screenshot-1.png! However, some of the TCP connections are still busy when the task is actually finished. These connections won't close automatically until we restart the NodeManager process. Connections pile up and NodeManagers are getting slower and slower. !screenshot-2.png! was:When executing shuffle tasks, > Make TCP connections created by shuffle service auto close on YARN > NodeManagers > --- > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > set spark.shuffle.service.enabled=true >Reporter: Deegue >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > When executing shuffle tasks, TCP connections(on port 7337 by default) will > be established by shuffle service. > It will like: > !screenshot-1.png! > However, some of the TCP connections are still busy when the task is actually > finished. These connections won't close automatically until we restart the > NodeManager process. > Connections pile up and NodeManagers are getting slower and slower. > !screenshot-2.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28239) Make TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-28239: --- Attachment: screenshot-2.png > Make TCP connections created by shuffle service auto close on YARN > NodeManagers > --- > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > set spark.shuffle.service.enabled=true >Reporter: Deegue >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > When executing shuffle tasks, -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28239) Make TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-28239: --- Attachment: (was: screenshot-1.png) > Make TCP connections created by shuffle service auto close on YARN > NodeManagers > --- > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > set spark.shuffle.service.enabled=true >Reporter: Deegue >Priority: Minor > Attachments: screenshot-1.png > > > When executing shuffle tasks, -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28239) Make TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-28239: --- Attachment: screenshot-1.png > Make TCP connections created by shuffle service auto close on YARN > NodeManagers > --- > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > set spark.shuffle.service.enabled=true >Reporter: Deegue >Priority: Minor > Attachments: screenshot-1.png > > > When executing shuffle tasks, -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28239) Make TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-28239: --- Attachment: screenshot-1.png > Make TCP connections created by shuffle service auto close on YARN > NodeManagers > --- > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > set spark.shuffle.service.enabled=true >Reporter: Deegue >Priority: Minor > Attachments: screenshot-1.png > > > When executing shuffle tasks, -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28239) Make TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-28239: --- Description: When executing shuffle tasks, (was: When we set spark.shuffle.service.enabled=true, ) > Make TCP connections created by shuffle service auto close on YARN > NodeManagers > --- > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > set spark.shuffle.service.enabled=true >Reporter: Deegue >Priority: Minor > > When executing shuffle tasks, -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28239) Make TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-28239: --- Description: When we set spark.shuffle.service.enabled=true, > Make TCP connections created by shuffle service auto close on YARN > NodeManagers > --- > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) >Reporter: Deegue >Priority: Minor > > When we set spark.shuffle.service.enabled=true, -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28239) Make TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-28239: --- Environment: Hadoop2.6.0-CDH5.8.3(netty3) Spark2.4.0(netty4) set spark.shuffle.service.enabled=true was: Hadoop2.6.0-CDH5.8.3(netty3) Spark2.4.0(netty4) > Make TCP connections created by shuffle service auto close on YARN > NodeManagers > --- > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > set spark.shuffle.service.enabled=true >Reporter: Deegue >Priority: Minor > > When we set spark.shuffle.service.enabled=true, -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28239) Make TCP connections created by shuffle service auto close on YARN NodeManagers
Deegue created SPARK-28239: -- Summary: Make TCP connections created by shuffle service auto close on YARN NodeManagers Key: SPARK-28239 URL: https://issues.apache.org/jira/browse/SPARK-28239 Project: Spark Issue Type: Improvement Components: Shuffle, YARN Affects Versions: 2.4.0 Environment: Hadoop2.6.0-CDH5.8.3(netty3) Spark2.4.0(netty4) Reporter: Deegue -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26667) Add `Scanning Input Table` to Performance Tuning Guide
[ https://issues.apache.org/jira/browse/SPARK-26667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-26667: --- Description: We can use `CombineTextInputFormat` instead of `TextInputFormat` and set configurations to increase the speed while reading a table. There's no need to add spark configurations, [PR#23506|https://github.com/apache/spark/pull/23506], so add it to the Performance Tuning. was: We can use `CombineTextInputFormat` instead of `TextInputFormat` and set configurations to increase the speed while reading a table. There's no need to add spark configurations, [link title|[https://github.com/apache/spark/pull/23506],] so add it to the Performance Tuning. > Add `Scanning Input Table` to Performance Tuning Guide > -- > > Key: SPARK-26667 > URL: https://issues.apache.org/jira/browse/SPARK-26667 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.4.1, 3.0.0 >Reporter: Deegue >Priority: Minor > > We can use `CombineTextInputFormat` instead of `TextInputFormat` and set > configurations to increase the speed while reading a table. > There's no need to add spark configurations, > [PR#23506|https://github.com/apache/spark/pull/23506], so add it to the > Performance Tuning. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26667) Add `Scanning Input Table` to Performance Tuning Guide
Deegue created SPARK-26667: -- Summary: Add `Scanning Input Table` to Performance Tuning Guide Key: SPARK-26667 URL: https://issues.apache.org/jira/browse/SPARK-26667 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 2.4.1, 3.0.0 Reporter: Deegue We can use `CombineTextInputFormat` instead of `TextInputFormat` and set configurations to increase the speed while reading a table. There's no need to add spark configurations, [link title|[https://github.com/apache/spark/pull/23506],] so add it to the Performance Tuning. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26630) ClassCastException in TableReader while creating HadoopRDD
[ https://issues.apache.org/jira/browse/SPARK-26630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746985#comment-16746985 ] Deegue commented on SPARK-26630: Hi [~dongjoon], It seems all of the versions are affected, because we only used HadoopRDD before. And I don't know which `Affects Version/s:` to choose... > ClassCastException in TableReader while creating HadoopRDD > -- > > Key: SPARK-26630 > URL: https://issues.apache.org/jira/browse/SPARK-26630 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 2.4.1, 3.0.0 >Reporter: Deegue >Priority: Major > > This bug found in [link title|https://github.com/apache/spark/pull/23506] (PR > #23506). > It will throw ClassCastException when we use new input format (eg. > `org.apache.hadoop.mapreduce.InputFormat`) to create HadoopRDD.So we need to > use NewHadoopRDD to deal with this input format in TableReader.scala. > Exception : > {noformat} > Caused by: java.lang.ClassCastException: > org.apache.hadoop.mapreduce.lib.input.TextInputFormat cannot be cast to > org.apache.hadoop.mapred.InputFormat > at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:190) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) > at scala.Option.getOrElse(Option.scala:138) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) > at scala.Option.getOrElse(Option.scala:138) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) > at scala.Option.getOrElse(Option.scala:138) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) > at scala.Option.getOrElse(Option.scala:138) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) > at scala.Option.getOrElse(Option.scala:138) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) > at scala.Option.getOrElse(Option.scala:138) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) > at org.apache.spark.ShuffleDependency.(Dependency.scala:96) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.prepareShuffleDependency(ShuffleExchangeExec.scala:343) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:101) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.$anonfun$doExecute$1(ShuffleExchangeExec.scala:137) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) > ... 87 more > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26630) ClassCastException in TableReader while creating HadoopRDD
[ https://issues.apache.org/jira/browse/SPARK-26630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-26630: --- Affects Version/s: 2.4.1 2.4.0 > ClassCastException in TableReader while creating HadoopRDD > -- > > Key: SPARK-26630 > URL: https://issues.apache.org/jira/browse/SPARK-26630 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 2.4.1, 3.0.0 >Reporter: Deegue >Priority: Major > > This bug found in [link title|https://github.com/apache/spark/pull/23506] (PR > #23506). > It will throw ClassCastException when we use new input format (eg. > `org.apache.hadoop.mapreduce.InputFormat`) to create HadoopRDD.So we need to > use NewHadoopRDD to deal with this input format in TableReader.scala. > Exception : > {noformat} > Caused by: java.lang.ClassCastException: > org.apache.hadoop.mapreduce.lib.input.TextInputFormat cannot be cast to > org.apache.hadoop.mapred.InputFormat > at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:190) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) > at scala.Option.getOrElse(Option.scala:138) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) > at scala.Option.getOrElse(Option.scala:138) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) > at scala.Option.getOrElse(Option.scala:138) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) > at scala.Option.getOrElse(Option.scala:138) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) > at scala.Option.getOrElse(Option.scala:138) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) > at scala.Option.getOrElse(Option.scala:138) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) > at org.apache.spark.ShuffleDependency.(Dependency.scala:96) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.prepareShuffleDependency(ShuffleExchangeExec.scala:343) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:101) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.$anonfun$doExecute$1(ShuffleExchangeExec.scala:137) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) > ... 87 more > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26630) ClassCastException in TableReader while creating HadoopRDD
Deegue created SPARK-26630: -- Summary: ClassCastException in TableReader while creating HadoopRDD Key: SPARK-26630 URL: https://issues.apache.org/jira/browse/SPARK-26630 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Deegue This bug found in [link title|https://github.com/apache/spark/pull/23506] (PR #23506). It will throw ClassCastException when we use new input format (eg. `org.apache.hadoop.mapreduce.InputFormat`) to create HadoopRDD.So we need to use NewHadoopRDD to deal with this input format in TableReader.scala. Exception : {noformat} Caused by: java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.TextInputFormat cannot be cast to org.apache.hadoop.mapred.InputFormat at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:190) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) at org.apache.spark.ShuffleDependency.(Dependency.scala:96) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.prepareShuffleDependency(ShuffleExchangeExec.scala:343) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:101) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.$anonfun$doExecute$1(ShuffleExchangeExec.scala:137) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ... 87 more {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26577) Add input optimizer when reading Hive table by SparkSQL
[ https://issues.apache.org/jira/browse/SPARK-26577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-26577: --- Description: When using SparkSQL, for example the ThriftServer, if we set `spark.sql.hive.inputFormat.optimizer.enabled=true` we can optimize the InputFormat to CombineTextInputFormat automatically if it's TextInputFormat before. And we can also change the max/min size of input splits by setting, for example `mapreduce.input.fileinputformat.split.maxsize=268435456` `mapreduce.input.fileinputformat.split.minsize=134217728` Otherwise, we have to modify Hive Configs and structure of tables. was: When using SparkSQL, for example the ThriftServer, if we set `spark.sql.hive.fileInputFormat.enabled=true` we can optimize the InputFormat to CombineTextInputFormat automatically if it's TextInputFormat before. And we can also change the max/min size of input splits by setting, for example `spark.sql.hive.fileInputFormat.split.maxsize=268435456` `spark.sql.hive.fileInputFormat.split.minsize=134217728` Otherwise, we have to modify Hive Configs and structure of tables. > Add input optimizer when reading Hive table by SparkSQL > --- > > Key: SPARK-26577 > URL: https://issues.apache.org/jira/browse/SPARK-26577 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.1 >Reporter: Deegue >Priority: Minor > > When using SparkSQL, for example the ThriftServer, if we set > `spark.sql.hive.inputFormat.optimizer.enabled=true` > we can optimize the InputFormat to CombineTextInputFormat automatically if > it's TextInputFormat before. And we can also change the max/min size of input > splits by setting, for example > `mapreduce.input.fileinputformat.split.maxsize=268435456` > `mapreduce.input.fileinputformat.split.minsize=134217728` > > Otherwise, we have to modify Hive Configs and structure of tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26577) Add input optimizer when reading Hive table by SparkSQL
[ https://issues.apache.org/jira/browse/SPARK-26577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-26577: --- External issue URL: https://github.com/apache/spark/pull/23496 > Add input optimizer when reading Hive table by SparkSQL > --- > > Key: SPARK-26577 > URL: https://issues.apache.org/jira/browse/SPARK-26577 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.1 >Reporter: Deegue >Priority: Minor > > When using SparkSQL, for example the ThriftServer, if we set > `spark.sql.hive.fileInputFormat.enabled=true` > we can optimize the InputFormat to CombineTextInputFormat automatically if > it's TextInputFormat before. And we can also change the max/min size of input > splits by setting, for example > `spark.sql.hive.fileInputFormat.split.maxsize=268435456` > `spark.sql.hive.fileInputFormat.split.minsize=134217728` > > Otherwise, we have to modify Hive Configs and structure of tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26577) Add input optimizer when reading Hive table by SparkSQL
Deegue created SPARK-26577: -- Summary: Add input optimizer when reading Hive table by SparkSQL Key: SPARK-26577 URL: https://issues.apache.org/jira/browse/SPARK-26577 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.1 Reporter: Deegue When using SparkSQL, for example the ThriftServer, if we set `spark.sql.hive.fileInputFormat.enabled=true` we can optimize the InputFormat to CombineTextInputFormat automatically if it's TextInputFormat before. And we can also change the max/min size of input splits by setting, for example `spark.sql.hive.fileInputFormat.split.maxsize=268435456` `spark.sql.hive.fileInputFormat.split.minsize=134217728` Otherwise, we have to modify Hive Configs and structure of tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-24672) No job is running but there are active tasks
[ https://issues.apache.org/jira/browse/SPARK-24672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue reopened SPARK-24672: Coditions to this issue: When the amount of data I selected is larger than spark.driver.maxResultSize , It returns the info below and the job failed automatically . !image4.png! After that , there are several active tasks remain that occupy executors. !image2.png! Thanks a lot for your comment. > No job is running but there are active tasks > > > Key: SPARK-24672 > URL: https://issues.apache.org/jira/browse/SPARK-24672 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core, SQL >Affects Versions: 2.2.1 > Environment: hadoop 2.6.0 > spark 2.2.1 > CDH 5.8.3 > java 1.8.0 > > More info : > image1.png in Attachments >Reporter: Deegue >Priority: Major > Attachments: image1.png, image2.png, image3.png, image4.png > > Original Estimate: 120h > Remaining Estimate: 120h > > Issues: > > There are active tasks while no job is running. > These active tasks occupy the executors(resources) , and I don't know why > they haven't been killed or stopped after its jobs failed. > > More info : > image2.png & image3.png in Attachments > > I'd be very appreciated it if anyone can help me... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24672) No job is running but there are active tasks
[ https://issues.apache.org/jira/browse/SPARK-24672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-24672: --- Attachment: image4.png > No job is running but there are active tasks > > > Key: SPARK-24672 > URL: https://issues.apache.org/jira/browse/SPARK-24672 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core, SQL >Affects Versions: 2.2.1 > Environment: hadoop 2.6.0 > spark 2.2.1 > CDH 5.8.3 > java 1.8.0 > > More info : > image1.png in Attachments >Reporter: Deegue >Priority: Major > Attachments: image1.png, image2.png, image3.png, image4.png > > Original Estimate: 120h > Remaining Estimate: 120h > > Issues: > > There are active tasks while no job is running. > These active tasks occupy the executors(resources) , and I don't know why > they haven't been killed or stopped after its jobs failed. > > More info : > image2.png & image3.png in Attachments > > I'd be very appreciated it if anyone can help me... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24672) No job is running but there are active tasks
[ https://issues.apache.org/jira/browse/SPARK-24672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-24672: --- Description: Issues: There are active tasks while no job is running. These active tasks occupy the executors(resources) , and I don't know why they haven't been killed or stopped after its jobs failed. More info : image2.png & image3.png in Attachments I'd be very appreciated it if anyone can help me... was: Issues: There are active tasks while no job is running. These active tasks occupy the executors(resources) , and I don't know why they haven't been killed or stopped after its jobs failed. More info : image2.png & image3.png & image4.png in Attachments I'd be very appreciated it if anyone can help me... > No job is running but there are active tasks > > > Key: SPARK-24672 > URL: https://issues.apache.org/jira/browse/SPARK-24672 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core, SQL >Affects Versions: 2.2.1 > Environment: hadoop 2.6.0 > spark 2.2.1 > CDH 5.8.3 > java 1.8.0 > > More info : > image1.png in Attachments >Reporter: Deegue >Priority: Major > Attachments: image1.png, image2.png, image3.png > > Original Estimate: 120h > Remaining Estimate: 120h > > Issues: > > There are active tasks while no job is running. > These active tasks occupy the executors(resources) , and I don't know why > they haven't been killed or stopped after its jobs failed. > > More info : > image2.png & image3.png in Attachments > > I'd be very appreciated it if anyone can help me... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24672) No job is running but there are active tasks
[ https://issues.apache.org/jira/browse/SPARK-24672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-24672: --- Attachment: image3.png image2.png > No job is running but there are active tasks > > > Key: SPARK-24672 > URL: https://issues.apache.org/jira/browse/SPARK-24672 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core, SQL >Affects Versions: 2.2.1 > Environment: hadoop 2.6.0 > spark 2.2.1 > CDH 5.8.3 > java 1.8.0 > > More info : > image1.png in Attachments >Reporter: Deegue >Priority: Major > Attachments: image1.png, image2.png, image3.png > > Original Estimate: 120h > Remaining Estimate: 120h > > Issues: > > There are active tasks while no job is running. > These active tasks occupy the executors(resources) , and I don't know why > they haven't been killed or stopped after its jobs failed. > > More info : > image2.png & image3.png & image4.png in Attachments > > I'd be very appreciated it if anyone can help me... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24672) No job is running but there are active tasks
[ https://issues.apache.org/jira/browse/SPARK-24672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-24672: --- Description: Issues: There are active tasks while no job is running. These active tasks occupy the executors(resources) , and I don't know why they haven't been killed or stopped after its jobs failed. More info : image2.png & image3.png & image4.png in Attachments I'd be very appreciated it if anyone can help me... was: Issues: There are active tasks while no job is running. These active tasks occupy the executors(resources) , and I don't know why they haven't been killed or stopped after its jobs failed. !image-2018-06-28-15-18-50-877.png! !image-2018-06-28-15-25-50-812.png! !image-2018-06-28-15-26-54-721.png! > No job is running but there are active tasks > > > Key: SPARK-24672 > URL: https://issues.apache.org/jira/browse/SPARK-24672 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core, SQL >Affects Versions: 2.2.1 > Environment: hadoop 2.6.0 > spark 2.2.1 > CDH 5.8.3 > java 1.8.0 > > More info : > image1.png in Attachments >Reporter: Deegue >Priority: Major > Attachments: image1.png > > Original Estimate: 120h > Remaining Estimate: 120h > > Issues: > > There are active tasks while no job is running. > These active tasks occupy the executors(resources) , and I don't know why > they haven't been killed or stopped after its jobs failed. > > More info : > image2.png & image3.png & image4.png in Attachments > > I'd be very appreciated it if anyone can help me... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24672) No job is running but there are active tasks
[ https://issues.apache.org/jira/browse/SPARK-24672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-24672: --- Environment: hadoop 2.6.0 spark 2.2.1 CDH 5.8.3 java 1.8.0 More info : image1.png in Attachments was: hadoop 2.6.0 spark 2.2.1 CDH 5.8.3 java 1.8.0 image1.png > No job is running but there are active tasks > > > Key: SPARK-24672 > URL: https://issues.apache.org/jira/browse/SPARK-24672 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core, SQL >Affects Versions: 2.2.1 > Environment: hadoop 2.6.0 > spark 2.2.1 > CDH 5.8.3 > java 1.8.0 > > More info : > image1.png in Attachments >Reporter: Deegue >Priority: Major > Attachments: image1.png > > Original Estimate: 120h > Remaining Estimate: 120h > > Issues: > > There are active tasks while no job is running. > These active tasks occupy the executors(resources) , and I don't know why > they haven't been killed or stopped after its jobs failed. > > !image-2018-06-28-15-18-50-877.png! > > !image-2018-06-28-15-25-50-812.png! > > !image-2018-06-28-15-26-54-721.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24672) No job is running but there are active tasks
[ https://issues.apache.org/jira/browse/SPARK-24672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-24672: --- Attachment: image1.png > No job is running but there are active tasks > > > Key: SPARK-24672 > URL: https://issues.apache.org/jira/browse/SPARK-24672 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core, SQL >Affects Versions: 2.2.1 > Environment: hadoop 2.6.0 > spark 2.2.1 > CDH 5.8.3 > java 1.8.0 > > image1.png >Reporter: Deegue >Priority: Major > Attachments: image1.png > > Original Estimate: 120h > Remaining Estimate: 120h > > Issues: > > There are active tasks while no job is running. > These active tasks occupy the executors(resources) , and I don't know why > they haven't been killed or stopped after its jobs failed. > > !image-2018-06-28-15-18-50-877.png! > > !image-2018-06-28-15-25-50-812.png! > > !image-2018-06-28-15-26-54-721.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24672) No job is running but there are active tasks
[ https://issues.apache.org/jira/browse/SPARK-24672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deegue updated SPARK-24672: --- Environment: hadoop 2.6.0 spark 2.2.1 CDH 5.8.3 java 1.8.0 image1.png was: hadoop 2.6.0 spark 2.2.1 CDH 5.8.3 java 1.8.0 !image-2018-06-28-15-28-11-200.png! > No job is running but there are active tasks > > > Key: SPARK-24672 > URL: https://issues.apache.org/jira/browse/SPARK-24672 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core, SQL >Affects Versions: 2.2.1 > Environment: hadoop 2.6.0 > spark 2.2.1 > CDH 5.8.3 > java 1.8.0 > > image1.png >Reporter: Deegue >Priority: Major > Original Estimate: 120h > Remaining Estimate: 120h > > Issues: > > There are active tasks while no job is running. > These active tasks occupy the executors(resources) , and I don't know why > they haven't been killed or stopped after its jobs failed. > > !image-2018-06-28-15-18-50-877.png! > > !image-2018-06-28-15-25-50-812.png! > > !image-2018-06-28-15-26-54-721.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24672) No job is running but there are active tasks
Deegue created SPARK-24672: -- Summary: No job is running but there are active tasks Key: SPARK-24672 URL: https://issues.apache.org/jira/browse/SPARK-24672 Project: Spark Issue Type: Bug Components: Optimizer, Spark Core, SQL Affects Versions: 2.2.1 Environment: hadoop 2.6.0 spark 2.2.1 CDH 5.8.3 java 1.8.0 !image-2018-06-28-15-28-11-200.png! Reporter: Deegue Issues: There are active tasks while no job is running. These active tasks occupy the executors(resources) , and I don't know why they haven't been killed or stopped after its jobs failed. !image-2018-06-28-15-18-50-877.png! !image-2018-06-28-15-25-50-812.png! !image-2018-06-28-15-26-54-721.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org