[jira] [Commented] (SPARK-36917) java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in i
[ https://issues.apache.org/jira/browse/SPARK-36917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754259#comment-17754259 ] Davide Benedetto commented on SPARK-36917: -- Dear [~hvanhovell] , I solved this issue by distributing the jar of my custom function over the hdfs. > java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD > --- > > Key: SPARK-36917 > URL: https://issues.apache.org/jira/browse/SPARK-36917 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Submit >Affects Versions: 3.1.2 > Environment: Ubuntu 20 > Spark3.1.2-hadoop3.2 > Hadoop 3.1 >Reporter: Davide Benedetto >Priority: Major > Labels: hadoop, java, serializable, spark, spark-conf, ubuntu > > My spark Job fails with this error: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3) (davben-lubuntu executor 2): java.lang.ClassCastException: cannot > assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD > My OS Linux Ubuntu 20 is in this way organized: I have two user: /home/davben > and /home/hadoop. Into hadoop user I have installed hadoop 3.1 and > spark-3.1.2-hadoop3.2. Both users refers to java-8-openjdk Java > installation. The Spark job is launched from user davben on eclipse IDE in > this way: > I create the spark conf and the spark session > > {code:java} > System.setProperty("hadoop.home.dir", "/home/hadoop/hadoop"); > SparkConf sparkConf = new SparkConf() > .setAppName("simple") > .setMaster("yarn") > .set("spark.executor.memory", "1g") > .set("deploy.mode", "cluster") > .set("spark.yarn.stagingDir", "hdfs://localhost:9000/user/hadoop/") > .set("spark.hadoop.fs.defaultFS","hdfs://localhost:9000") > .set("spark.hadoop.yarn.resourcemanager.hostname","localhost") > .set("spark.hadoop.yarn.resourcemanager.scheduler.address","localhost:8030") > .set("spark.hadoop.yarn.resourcemanager.address ","localhost:8032") > .set("spark.hadoop.yarn.resourcemanager.webapp.address","localhost:8088") > .set("spark.hadoop.yarn.resourcemanager.admin.address","localhost:8083") > SparkSession spark = > SparkSession.builder().config(sparkConf).getOrCreate();{code} > Then I create a dataset with two entries: > > {code:java} > List rows = new ArrayList<>(); > rows.add(RowFactory.create("a", "b")); > rows.add(RowFactory.create("a", "a")); > StructType structType = new StructType(); > structType = structType.add("edge_1", DataTypes.StringType, false); > structType = structType.add("edge_2", DataTypes.StringType, false); > ExpressionEncoder edgeEncoder = RowEncoder.apply(structType); > Dataset edge = spark.createDataset(rows, edgeEncoder); > {code} > Then I print the content of the current dataset edge > {code:java} > edge.show(); > {code} > > Then I perform a map transformation on edge that upper cases the values of > the two entries and return the result in edge2 > {code:java} > Dataset edge2 = edge.map(new MyFunction2(), edgeEncoder);{code} > The following is the code of MyFunction2 > {code:java} > public class MyFunction2 implements MapFunction, scala.Serializable > { > private static final long serialVersionUID = 1L; > @Override public Row call(Row v1) throws Exception { > String el1 = v1.get(0).toString().toUpperCase(); > String el2 = v1.get(1).toString().toUpperCase(); > return RowFactory.create(el1,el2); > } > }{code} > Finally I show the content of edge2 > {code:java} > edge2.show(); > {code} > I can confirm that, checking on the hadoop UI a localhost:8088, the job is > submitted correctly, and > what sounds strange is that the first show is returned correctly in my > console, but the second one fails returning the up mentioned error. > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36917) java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in i
[ https://issues.apache.org/jira/browse/SPARK-36917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423918#comment-17423918 ] Davide Benedetto commented on SPARK-36917: -- I have checked in other Jira but any of them resolved my problem unfortunately. [~hyukjin.kwon] I'have found a similar issue that you closed, but I can't reach the resolution https://issues.apache.org/jira/browse/SPARK-5506?jql=text%20~%20%22java.lang.ClassCastException%20Serialization%22 > java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD > --- > > Key: SPARK-36917 > URL: https://issues.apache.org/jira/browse/SPARK-36917 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Submit >Affects Versions: 3.1.2 > Environment: Ubuntu 20 > Spark3.1.2-hadoop3.2 > Hadoop 3.1 >Reporter: Davide Benedetto >Priority: Major > Labels: hadoop, java, serializable, spark, spark-conf, ubuntu > > My spark Job fails with this error: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3) (davben-lubuntu executor 2): java.lang.ClassCastException: cannot > assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD > My OS Linux Ubuntu 20 is in this way organized: I have two user: /home/davben > and /home/hadoop. Into hadoop user I have installed hadoop 3.1 and > spark-3.1.2-hadoop3.2. Both users refers to java-8-openjdk Java > installation. The Spark job is launched from user davben on eclipse IDE in > this way: > I create the spark conf and the spark session > > {code:java} > System.setProperty("hadoop.home.dir", "/home/hadoop/hadoop"); > SparkConf sparkConf = new SparkConf() > .setAppName("simple") > .setMaster("yarn") > .set("spark.executor.memory", "1g") > .set("deploy.mode", "cluster") > .set("spark.yarn.stagingDir", "hdfs://localhost:9000/user/hadoop/") > .set("spark.hadoop.fs.defaultFS","hdfs://localhost:9000") > .set("spark.hadoop.yarn.resourcemanager.hostname","localhost") > .set("spark.hadoop.yarn.resourcemanager.scheduler.address","localhost:8030") > .set("spark.hadoop.yarn.resourcemanager.address ","localhost:8032") > .set("spark.hadoop.yarn.resourcemanager.webapp.address","localhost:8088") > .set("spark.hadoop.yarn.resourcemanager.admin.address","localhost:8083") > SparkSession spark = > SparkSession.builder().config(sparkConf).getOrCreate();{code} > Then I create a dataset with two entries: > > {code:java} > List rows = new ArrayList<>(); > rows.add(RowFactory.create("a", "b")); > rows.add(RowFactory.create("a", "a")); > StructType structType = new StructType(); > structType = structType.add("edge_1", DataTypes.StringType, false); > structType = structType.add("edge_2", DataTypes.StringType, false); > ExpressionEncoder edgeEncoder = RowEncoder.apply(structType); > Dataset edge = spark.createDataset(rows, edgeEncoder); > {code} > Then I print the content of the current dataset edge > {code:java} > edge.show(); > {code} > > Then I perform a map transformation on edge that upper cases the values of > the two entries and return the result in edge2 > {code:java} > Dataset edge2 = edge.map(new MyFunction2(), edgeEncoder);{code} > The following is the code of MyFunction2 > {code:java} > public class MyFunction2 implements MapFunction, scala.Serializable > { > private static final long serialVersionUID = 1L; > @Override public Row call(Row v1) throws Exception { > String el1 = v1.get(0).toString().toUpperCase(); > String el2 = v1.get(1).toString().toUpperCase(); > return RowFactory.create(el1,el2); > } > }{code} > Finally I show the content of edge2 > {code:java} > edge2.show(); > {code} > I can confirm that, checking on the hadoop UI a localhost:8088, the job is > submitted correctly, and > what sounds strange is that the first show is returned correctly in my > console, but the second one fails returning the up mentioned error. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36917) java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in i
[ https://issues.apache.org/jira/browse/SPARK-36917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423867#comment-17423867 ] Davide Benedetto commented on SPARK-36917: -- [~srowen] Can You Please help me with this issue? > java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD > --- > > Key: SPARK-36917 > URL: https://issues.apache.org/jira/browse/SPARK-36917 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Submit >Affects Versions: 3.1.2 > Environment: Ubuntu 20 > Spark3.1.2-hadoop3.2 > Hadoop 3.1 >Reporter: Davide Benedetto >Priority: Major > Labels: hadoop, java, serializable, spark, spark-conf, ubuntu > > My spark Job fails with this error: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3) (davben-lubuntu executor 2): java.lang.ClassCastException: cannot > assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD > My OS Linux Ubuntu 20 is in this way organized: I have two user: /home/davben > and /home/hadoop. Into hadoop user I have installed hadoop 3.1 and > spark-3.1.2-hadoop3.2. Both users refers to java-8-openjdk Java > installation. The Spark job is launched from user davben on eclipse IDE in > this way: > I create the spark conf and the spark session > > {code:java} > System.setProperty("hadoop.home.dir", "/home/hadoop/hadoop"); > SparkConf sparkConf = new SparkConf() > .setAppName("simple") > .setMaster("yarn") > .set("spark.executor.memory", "1g") > .set("deploy.mode", "cluster") > .set("spark.yarn.stagingDir", "hdfs://localhost:9000/user/hadoop/") > .set("spark.hadoop.fs.defaultFS","hdfs://localhost:9000") > .set("spark.hadoop.yarn.resourcemanager.hostname","localhost") > .set("spark.hadoop.yarn.resourcemanager.scheduler.address","localhost:8030") > .set("spark.hadoop.yarn.resourcemanager.address ","localhost:8032") > .set("spark.hadoop.yarn.resourcemanager.webapp.address","localhost:8088") > .set("spark.hadoop.yarn.resourcemanager.admin.address","localhost:8083") > SparkSession spark = > SparkSession.builder().config(sparkConf).getOrCreate();{code} > Then I create a dataset with two entries: > > {code:java} > List rows = new ArrayList<>(); > rows.add(RowFactory.create("a", "b")); > rows.add(RowFactory.create("a", "a")); > StructType structType = new StructType(); > structType = structType.add("edge_1", DataTypes.StringType, false); > structType = structType.add("edge_2", DataTypes.StringType, false); > ExpressionEncoder edgeEncoder = RowEncoder.apply(structType); > Dataset edge = spark.createDataset(rows, edgeEncoder); > {code} > Then I print the content of the current dataset edge > {code:java} > edge.show(); > {code} > > Then I perform a map transformation on edge that upper cases the values of > the two entries and return the result in edge2 > {code:java} > Dataset edge2 = edge.map(new MyFunction2(), edgeEncoder);{code} > The following is the code of MyFunction2 > {code:java} > public class MyFunction2 implements MapFunction, scala.Serializable > { > private static final long serialVersionUID = 1L; > @Override public Row call(Row v1) throws Exception { > String el1 = v1.get(0).toString().toUpperCase(); > String el2 = v1.get(1).toString().toUpperCase(); > return RowFactory.create(el1,el2); > } > }{code} > Finally I show the content of edge2 > {code:java} > edge2.show(); > {code} > I can confirm that, checking on the hadoop UI a localhost:8088, the job is > submitted correctly, and > what sounds strange is that the first show is returned correctly in my > console, but the second one fails returning the up mentioned error. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18075) UDF doesn't work on non-local spark
[ https://issues.apache.org/jira/browse/SPARK-18075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423740#comment-17423740 ] Davide Benedetto commented on SPARK-18075: -- [~cloud_fan] Are you now able to run the job with yarn from IDE? > UDF doesn't work on non-local spark > --- > > Key: SPARK-18075 > URL: https://issues.apache.org/jira/browse/SPARK-18075 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1, 2.0.0 >Reporter: Nick Orka >Priority: Major > > I have the issue with Spark 2.0.0 (spark-2.0.0-bin-hadoop2.7.tar.gz) > According to this ticket https://issues.apache.org/jira/browse/SPARK-9219 > I've made all spark dependancies with PROVIDED scope. I use 100% same > versions of spark in the app as well as for spark server. > Here is my pom: > {code:title=pom.xml} > > 1.6 > 1.6 > UTF-8 > 2.11.8 > 2.0.0 > 2.7.0 > > > > > org.apache.spark > spark-core_2.11 > ${spark.version} > provided > > > org.apache.spark > spark-sql_2.11 > ${spark.version} > provided > > > org.apache.spark > spark-hive_2.11 > ${spark.version} > provided > > > {code} > As you can see all spark dependencies have provided scope > And this is a code for reproduction: > {code:title=udfTest.scala} > import org.apache.spark.sql.types.{StringType, StructField, StructType} > import org.apache.spark.sql.{Row, SparkSession} > /** > * Created by nborunov on 10/19/16. > */ > object udfTest { > class Seq extends Serializable { > var i = 0 > def getVal: Int = { > i = i + 1 > i > } > } > def main(args: Array[String]) { > val spark = SparkSession > .builder() > .master("spark://nborunov-mbp.local:7077") > // .master("local") > .getOrCreate() > val rdd = spark.sparkContext.parallelize(Seq(Row("one"), Row("two"))) > val schema = StructType(Array(StructField("name", StringType))) > val df = spark.createDataFrame(rdd, schema) > df.show() > spark.udf.register("func", (name: String) => name.toUpperCase) > import org.apache.spark.sql.functions.expr > val newDf = df.withColumn("upperName", expr("func(name)")) > newDf.show() > val seq = new Seq > spark.udf.register("seq", () => seq.getVal) > val seqDf = df.withColumn("id", expr("seq()")) > seqDf.show() > df.createOrReplaceTempView("df") > spark.sql("select *, seq() as sql_id from df").show() > } > } > {code} > When .master("local") - everything works fine. When > .master("spark://...:7077"), it fails on line: > {code} > newDf.show() > {code} > The error is exactly the same: > {code} > scala> udfTest.main(Array()) > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/Users/nborunov/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/Users/nborunov/.m2/repository/ch/qos/logback/logback-classic/1.1.7/logback-classic-1.1.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 16/10/19 19:37:52 INFO SparkContext: Running Spark version 2.0.0 > 16/10/19 19:37:52 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 16/10/19 19:37:52 INFO SecurityManager: Changing view acls to: nborunov > 16/10/19 19:37:52 INFO SecurityManager: Changing modify acls to: nborunov > 16/10/19 19:37:52 INFO SecurityManager: Changing view acls groups to: > 16/10/19 19:37:52 INFO SecurityManager: Changing modify acls groups to: > 16/10/19 19:37:52 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(nborunov); > groups with view permissions: Set(); users with modify permissions: > Set(nborunov); groups with modify permissions: Set() > 16/10/19 19:37:53 INFO Utils: Successfully started service 'sparkDriver' on > port 57828. > 16/10/19 19:37:53 INFO SparkEnv: Registering MapOutputTracker > 16/10/19 19:37:53 INFO SparkEnv: Registering BlockManagerMaster > 16/10/19 19:37:53 INFO DiskBlockManager: Created local directory at > /private/var/folders/hl/2fv6555n2w92272zywwvpbzhgq/T/blockmgr-f2d05423-b7f7-4525-b41e-10dfe2f88264 > 16/10/19 19:37:53 INFO MemoryStore: MemoryStore started with capacity 2004.6 > MB > 16/10/19 19:37:53 INFO SparkEnv: Registering OutputCommitCoordinator > 16/
[jira] [Created] (SPARK-36917) java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in ins
Davide Benedetto created SPARK-36917: Summary: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD Key: SPARK-36917 URL: https://issues.apache.org/jira/browse/SPARK-36917 Project: Spark Issue Type: Bug Components: Spark Core, Spark Submit Affects Versions: 3.1.2 Environment: Ubuntu 20 Spark3.1.2-hadoop3.2 Hadoop 3.1 Reporter: Davide Benedetto My spark Job fails with this error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (davben-lubuntu executor 2): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD My OS Linux Ubuntu 20 is in this way organized: I have two user: /home/davben and /home/hadoop. Into hadoop user I have installed hadoop 3.1 and spark-3.1.2-hadoop3.2. Both users refers to java-8-openjdk Java installation. The Spark job is launched from user davben on eclipse IDE in this way: I create the spark conf and the spark session {code:java} System.setProperty("hadoop.home.dir", "/home/hadoop/hadoop"); SparkConf sparkConf = new SparkConf() .setAppName("simple") .setMaster("yarn") .set("spark.executor.memory", "1g") .set("deploy.mode", "cluster") .set("spark.yarn.stagingDir", "hdfs://localhost:9000/user/hadoop/") .set("spark.hadoop.fs.defaultFS","hdfs://localhost:9000") .set("spark.hadoop.yarn.resourcemanager.hostname","localhost") .set("spark.hadoop.yarn.resourcemanager.scheduler.address","localhost:8030") .set("spark.hadoop.yarn.resourcemanager.address ","localhost:8032") .set("spark.hadoop.yarn.resourcemanager.webapp.address","localhost:8088") .set("spark.hadoop.yarn.resourcemanager.admin.address","localhost:8083") SparkSession spark = SparkSession.builder().config(sparkConf).getOrCreate();{code} Then I create a dataset with two entries: {code:java} List rows = new ArrayList<>(); rows.add(RowFactory.create("a", "b")); rows.add(RowFactory.create("a", "a")); StructType structType = new StructType(); structType = structType.add("edge_1", DataTypes.StringType, false); structType = structType.add("edge_2", DataTypes.StringType, false); ExpressionEncoder edgeEncoder = RowEncoder.apply(structType); Dataset edge = spark.createDataset(rows, edgeEncoder); {code} Then I print the content of the current dataset edge {code:java} edge.show(); {code} Then I perform a map transformation on edge that upper cases the values of the two entries and return the result in edge2 {code:java} Dataset edge2 = edge.map(new MyFunction2(), edgeEncoder);{code} The following is the code of MyFunction2 {code:java} public class MyFunction2 implements MapFunction, scala.Serializable { private static final long serialVersionUID = 1L; @Override public Row call(Row v1) throws Exception { String el1 = v1.get(0).toString().toUpperCase(); String el2 = v1.get(1).toString().toUpperCase(); return RowFactory.create(el1,el2); } }{code} Finally I show the content of edge2 {code:java} edge2.show(); {code} I can confirm that, checking on the hadoop UI a localhost:8088, the job is submitted correctly, and what sounds strange is that the first show is returned correctly in my console, but the second one fails returning the up mentioned error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36875) 21/09/28 11:18:51 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resourc
Davide Benedetto created SPARK-36875: Summary: 21/09/28 11:18:51 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources Key: SPARK-36875 URL: https://issues.apache.org/jira/browse/SPARK-36875 Project: Spark Issue Type: Bug Components: Spark Core, Spark Submit Affects Versions: 3.1.2 Environment: Eclipse Hadoop 3.3 Spark3.1.2-hadoop3.2 Dependencies {noformat} org.apache.spark spark-core_2.12 3.1.2 org.apache.spark spark-sql_2.12 3.1.2 janino org.codehaus.janino org.codehaus.janino janino 3.0.8 org.apache.spark spark-yarn_2.12 3.1.2 provided org.scala-lang scala-library 2.12.13 Enviroment Variables set in eclipse: SPARK_HOME path/to/my/sparkfolder OS Linux with UBUNTU 20 The test is launched on my first user davben. Spark folder and hadoop are on my second user hadoop{noformat} Reporter: Davide Benedetto Fix For: 3.1.2 Hi, I am running a spark job with yarn programmatically using Eclipse IDE. Here I 1: open the spark session passing a SparkConf as input parameter, {quote} {code:java} System.setProperty("hadoop.home.dir", "/home/hadoop/hadoop"); System.setProperty("hadoop.home.dir", "/home/hadoop/hadoop"); System.setProperty("SPARK_YARN_MODE", "yarn"); System.setProperty("HADOOP_USER_NAME", "hadoop"); SparkConf sparkConf = new SparkConf().setAppName("simpleTest2").setMaster("yarn") .set("spark.executor.memory", "1g") .set("deploy.mode", "cluster") .set("spark.yarn.stagingDir", "hdfs://localhost:9000/user/hadoop/") .set("spark.yarn.am.memory", "512m") .set("spark.dynamicAllocation.minExecutors","1") .set("spark.dynamicAllocation.maxExecutors","40") .set("spark.dynamicAllocation.initialExecutors","2") .set("spark.shuffle.service.enabled", "true") .set("spark.dynamicAllocation.enabled", "false") .set("spark.cores.max", "1") .set("spark.yarn.executor.memoryOverhead", "500m") .set("spark.executor.instances","2") .set("spark.executor.memory","500m") .set("spark.num.executors","2") .set("spark.executor.cores","1") .set("spark.worker.instances","1") .set("spark.worker.memory","512m") .set("spark.worker.max.heapsize","512m") .set("spark.worker.cores","1") .set("maximizeResourceAllocation", "true") .set("spark.yarn.nodemanager.resource.cpu-vcores","4") .set("spark.yarn.submit.file.replication", "1") SparkSession spark = SparkSession.builder().config(sparkConf).getOrCreate(); {code} {quote} 2: Create a dataset of two Rows and i Show them {code:java} List rows = new ArrayList<>(); List rows = new ArrayList<>(); rows.add(RowFactory.create("a", "b")); rows.add(RowFactory.create("b", "c")); rows.add(RowFactory.create("a", "a")); StructType structType = new StructType(); structType = structType.add("edge_1", DataTypes.StringType, false); structType = structType.add("edge_2", DataTypes.StringType, false); ExpressionEncoder edgeEncoder = RowEncoder.apply(structType); Dataset edge = spark.createDataset(rows, edgeEncoder); edge.show(); {code} {{From now it is all Ok, the job is submitted on hadoop and the rows are showed correctly}} {{3: I perform a Map that upper cases the elements in the row}} {quote} {code:java} Dataset edge2 = edge.map(new MyFunction2(), edgeEncoder); Dataset edge2 = edge.map(new MyFunction2(), edgeEncoder);{code} {quote} {quote}{{ public static class MyFunction2 implements MapFunction { public static class MyFunction2 implements MapFunction {}} {{ /** * */ private static final long serialVersionUID = 1L;}} {{ @Override public Row call(Row v1) throws Exception \{ String el1 = v1.get(0).toString().toUpperCase(); String el2 = v1.get(1).toString().toUpperCase(); return RowFactory.create(el1,el2); }}} {{ }}} {quote} {{4: Then I show the dataset after map is performed}} {quote} {code:java} edge2.show();{code} {quote} {{And precisely here the log Loops saying }} {{21/09/28 11:18:51 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources}} {quote}{{Here is the Full log}} {{log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).log4j:WARN Please initialize the log4j system properly.log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties21/09/28 03:05:16 WARN Utils: Your hostname, davben-lubuntu resolves to a loopback address: 127.0.1.1; using 192.168.1.36 instead (on interface wlo1)21/09/28 03:05:16 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to ano
[jira] [Commented] (SPARK-17722) YarnScheduler: Initial job has not accepted any resources
[ https://issues.apache.org/jira/browse/SPARK-17722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420999#comment-17420999 ] Davide Benedetto commented on SPARK-17722: -- Hi Partha I have your same issue. Which yarn configurations have you set? > YarnScheduler: Initial job has not accepted any resources > - > > Key: SPARK-17722 > URL: https://issues.apache.org/jira/browse/SPARK-17722 > Project: Spark > Issue Type: Bug >Reporter: Partha Pratim Ghosh >Priority: Major > > Connected spark in yarn mode from eclipse java. On trying to run task it is > giving the following - > YarnScheduler: Initial job has not accepted any resources; check your cluster > UI to ensure that workers are registered and have sufficient resources. The > request is going to Hadoop cluster scheduler and from there we can see the > job in spark UI. But there it is saying that no task has been assigned for > this. > Same code is running from spark-submit where we need to remove the following > lines - > System.setProperty("java.security.krb5.conf", "C:\\xxx\\krb5.conf"); > > org.apache.hadoop.conf.Configuration conf = new > org.apache.hadoop.conf.Configuration(); > conf.set("hadoop.security.authentication", "kerberos"); > UserGroupInformation.setConfiguration(conf); > Following is the configuration - > import org.apache.hadoop.security.UserGroupInformation; > import org.apache.spark.SparkConf; > import org.apache.spark.api.java.JavaRDD; > import org.apache.spark.api.java.JavaSparkContext; > import org.apache.spark.sql.DataFrame; > import org.apache.spark.sql.SQLContext; > public class TestConnectivity { > /** >* @param args >*/ > public static void main(String[] args) { > System.setProperty("java.security.krb5.conf", > "C:\\xxx\\krb5.conf"); > > org.apache.hadoop.conf.Configuration conf = new > org.apache.hadoop.conf.Configuration(); > conf.set("hadoop.security.authentication", "kerberos"); > UserGroupInformation.setConfiguration(conf); >SparkConf config = new SparkConf().setAppName("Test Spark "); >config = config.setMaster("yarn-client"); >config .set("spark.dynamicAllocation.enabled", "false"); >config.set("spark.executor.memory", "2g"); >config.set("spark.executor.instances", "1"); >config.set("spark.executor.cores", "2"); >//config.set("spark.driver.memory", "2g"); >//config.set("spark.driver.cores", "1"); >/*config.set("spark.executor.am.memory", "2g"); >config.set("spark.executor.am.cores", "2");*/ >config.set("spark.cores.max", "4"); >config.set("yarn.nodemanager.resource.cpu-vcores","4"); >config.set("spark.yarn.queue","root.root"); >/*config.set("spark.deploy.defaultCores", "2"); >config.set("spark.task.cpus", "2");*/ >config.set("spark.yarn.jar", > "file:/C:/xxx/spark-assembly_2.10-1.6.0-cdh5.7.1.jar"); > JavaSparkContext sc = new JavaSparkContext(config); > SQLContext sqlcontext = new SQLContext(sc); > DataFrame df = sqlcontext.jsonFile(logFile); > JavaRDD logData = > sc.textFile("sparkexamples/Employee.json").cache(); > DataFrame df = sqlcontext.jsonRDD(logData); > > df.show(); > df.printSchema(); > > //UserGroupInformation.setConfiguration(conf); > > } > } -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org