[jira] [Commented] (SPARK-3979) Yarn backend's default file replication should match HDFS's default one

2014-10-16 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174165#comment-14174165
 ] 

Marcelo Vanzin commented on SPARK-3979:
---

BTW, this would avoid issues like this:

{noformat}
Exception in thread main 
org.apache.hadoop.ipc.RemoteException(java.io.IOException): file 
/user/systest/.sparkStaging/application_1413485082283_0001/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0.jar.
Requested replication 3 exceeds maximum 1
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.verifyReplication(BlockManager.java:943)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInt(FSNamesystem.java:2243)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:2233)
...
at 
org.apache.spark.deploy.yarn.ClientBase$class.copyFileToRemote(ClientBase.scala:101)
{noformat}

 Yarn backend's default file replication should match HDFS's default one
 ---

 Key: SPARK-3979
 URL: https://issues.apache.org/jira/browse/SPARK-3979
 Project: Spark
  Issue Type: Bug
  Components: YARN
Reporter: Marcelo Vanzin
Priority: Minor

 This code in ClientBase.scala sets the replication used for files uploaded to 
 HDFS:
 {{noformat}}
 val replication = sparkConf.getInt(spark.yarn.submit.file.replication, 
 3).toShort
 {{noformat}}
 Instead of a hardcoded 3 (which is the default value for HDFS), it should 
 be using the default value from the HDFS conf (dfs.replication).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3979) Yarn backend's default file replication should match HDFS's default one

2014-10-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174224#comment-14174224
 ] 

Apache Spark commented on SPARK-3979:
-

User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/2831

 Yarn backend's default file replication should match HDFS's default one
 ---

 Key: SPARK-3979
 URL: https://issues.apache.org/jira/browse/SPARK-3979
 Project: Spark
  Issue Type: Bug
  Components: YARN
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor

 This code in ClientBase.scala sets the replication used for files uploaded to 
 HDFS:
 {code}
 val replication = sparkConf.getInt(spark.yarn.submit.file.replication, 
 3).toShort
 {code}
 Instead of a hardcoded 3 (which is the default value for HDFS), it should 
 be using the default value from the HDFS conf (dfs.replication).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org