[
https://issues.apache.org/jira/browse/SPARK-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174165#comment-14174165
]
Marcelo Vanzin commented on SPARK-3979:
---
BTW, this would avoid issues like this:
{noformat}
Exception in thread main
org.apache.hadoop.ipc.RemoteException(java.io.IOException): file
/user/systest/.sparkStaging/application_1413485082283_0001/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0.jar.
Requested replication 3 exceeds maximum 1
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.verifyReplication(BlockManager.java:943)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInt(FSNamesystem.java:2243)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:2233)
...
at
org.apache.spark.deploy.yarn.ClientBase$class.copyFileToRemote(ClientBase.scala:101)
{noformat}
Yarn backend's default file replication should match HDFS's default one
---
Key: SPARK-3979
URL: https://issues.apache.org/jira/browse/SPARK-3979
Project: Spark
Issue Type: Bug
Components: YARN
Reporter: Marcelo Vanzin
Priority: Minor
This code in ClientBase.scala sets the replication used for files uploaded to
HDFS:
{{noformat}}
val replication = sparkConf.getInt(spark.yarn.submit.file.replication,
3).toShort
{{noformat}}
Instead of a hardcoded 3 (which is the default value for HDFS), it should
be using the default value from the HDFS conf (dfs.replication).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org