[jira] [Commented] (SPARK-36917) java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in i

2023-08-14 Thread Davide Benedetto (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754259#comment-17754259
 ] 

Davide Benedetto commented on SPARK-36917:
--

Dear [~hvanhovell] , I solved this issue by distributing the jar of my custom 
function over the hdfs.

> java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD
> ---
>
> Key: SPARK-36917
> URL: https://issues.apache.org/jira/browse/SPARK-36917
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Submit
>Affects Versions: 3.1.2
> Environment: Ubuntu 20
> Spark3.1.2-hadoop3.2
> Hadoop 3.1
>Reporter: Davide Benedetto
>Priority: Major
>  Labels: hadoop, java, serializable, spark, spark-conf, ubuntu
>
> My spark Job fails with this error:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3) (davben-lubuntu executor 2): java.lang.ClassCastException: cannot 
> assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD
> My OS Linux Ubuntu 20 is in this way organized: I have two user: /home/davben 
> and /home/hadoop. Into hadoop user I have installed hadoop 3.1 and 
> spark-3.1.2-hadoop3.2.  Both users refers to java-8-openjdk Java 
> installation. The Spark job is launched from user davben on eclipse IDE  in 
> this way:
> I create the spark conf and the spark session
>  
> {code:java}
> System.setProperty("hadoop.home.dir", "/home/hadoop/hadoop");
> SparkConf sparkConf = new SparkConf()
> .setAppName("simple")
> .setMaster("yarn")
> .set("spark.executor.memory", "1g")
> .set("deploy.mode", "cluster")
> .set("spark.yarn.stagingDir", "hdfs://localhost:9000/user/hadoop/") 
> .set("spark.hadoop.fs.defaultFS","hdfs://localhost:9000") 
> .set("spark.hadoop.yarn.resourcemanager.hostname","localhost") 
> .set("spark.hadoop.yarn.resourcemanager.scheduler.address","localhost:8030") 
> .set("spark.hadoop.yarn.resourcemanager.address ","localhost:8032") 
> .set("spark.hadoop.yarn.resourcemanager.webapp.address","localhost:8088") 
> .set("spark.hadoop.yarn.resourcemanager.admin.address","localhost:8083")
> SparkSession spark = 
> SparkSession.builder().config(sparkConf).getOrCreate();{code}
> Then I create a dataset with two entries:
>  
> {code:java}
> List rows = new ArrayList<>(); 
> rows.add(RowFactory.create("a", "b"));
> rows.add(RowFactory.create("a", "a"));
> StructType structType = new StructType(); 
> structType = structType.add("edge_1", DataTypes.StringType, false);
> structType = structType.add("edge_2", DataTypes.StringType, false); 
> ExpressionEncoder edgeEncoder = RowEncoder.apply(structType);
> Dataset edge = spark.createDataset(rows, edgeEncoder);
> {code}
> Then I print the content of the current dataset edge
> {code:java}
>  edge.show();
> {code}
>  
> Then I perform a map transformation on edge that upper cases the values of 
> the two entries and return the result in edge2
> {code:java}
>  Dataset edge2 = edge.map(new MyFunction2(), edgeEncoder);{code}
> The following is the code of MyFunction2
> {code:java}
> public class MyFunction2 implements MapFunction, scala.Serializable 
> { 
> private static final long serialVersionUID = 1L;
> @Override public Row call(Row v1) throws Exception { 
> String el1 = v1.get(0).toString().toUpperCase(); 
> String el2 = v1.get(1).toString().toUpperCase(); 
> return RowFactory.create(el1,el2); 
> }
> }{code}
> Finally I show the content of edge2
> {code:java}
> edge2.show();
> {code}
> I can confirm that, checking on the hadoop UI a localhost:8088, the job is 
> submitted correctly, and
> what sounds strange is that the first show is returned correctly in my 
> console, but the second one fails returning the up mentioned error.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36917) java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in i

2021-10-04 Thread Davide Benedetto (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423918#comment-17423918
 ] 

Davide Benedetto commented on SPARK-36917:
--

I have checked in other Jira but any of them resolved my problem unfortunately.
[~hyukjin.kwon] I'have found a similar issue that you closed, but I can't reach 
the resolution
https://issues.apache.org/jira/browse/SPARK-5506?jql=text%20~%20%22java.lang.ClassCastException%20Serialization%22

> java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD
> ---
>
> Key: SPARK-36917
> URL: https://issues.apache.org/jira/browse/SPARK-36917
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Submit
>Affects Versions: 3.1.2
> Environment: Ubuntu 20
> Spark3.1.2-hadoop3.2
> Hadoop 3.1
>Reporter: Davide Benedetto
>Priority: Major
>  Labels: hadoop, java, serializable, spark, spark-conf, ubuntu
>
> My spark Job fails with this error:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3) (davben-lubuntu executor 2): java.lang.ClassCastException: cannot 
> assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD
> My OS Linux Ubuntu 20 is in this way organized: I have two user: /home/davben 
> and /home/hadoop. Into hadoop user I have installed hadoop 3.1 and 
> spark-3.1.2-hadoop3.2.  Both users refers to java-8-openjdk Java 
> installation. The Spark job is launched from user davben on eclipse IDE  in 
> this way:
> I create the spark conf and the spark session
>  
> {code:java}
> System.setProperty("hadoop.home.dir", "/home/hadoop/hadoop");
> SparkConf sparkConf = new SparkConf()
> .setAppName("simple")
> .setMaster("yarn")
> .set("spark.executor.memory", "1g")
> .set("deploy.mode", "cluster")
> .set("spark.yarn.stagingDir", "hdfs://localhost:9000/user/hadoop/") 
> .set("spark.hadoop.fs.defaultFS","hdfs://localhost:9000") 
> .set("spark.hadoop.yarn.resourcemanager.hostname","localhost") 
> .set("spark.hadoop.yarn.resourcemanager.scheduler.address","localhost:8030") 
> .set("spark.hadoop.yarn.resourcemanager.address ","localhost:8032") 
> .set("spark.hadoop.yarn.resourcemanager.webapp.address","localhost:8088") 
> .set("spark.hadoop.yarn.resourcemanager.admin.address","localhost:8083")
> SparkSession spark = 
> SparkSession.builder().config(sparkConf).getOrCreate();{code}
> Then I create a dataset with two entries:
>  
> {code:java}
> List rows = new ArrayList<>(); 
> rows.add(RowFactory.create("a", "b"));
> rows.add(RowFactory.create("a", "a"));
> StructType structType = new StructType(); 
> structType = structType.add("edge_1", DataTypes.StringType, false);
> structType = structType.add("edge_2", DataTypes.StringType, false); 
> ExpressionEncoder edgeEncoder = RowEncoder.apply(structType);
> Dataset edge = spark.createDataset(rows, edgeEncoder);
> {code}
> Then I print the content of the current dataset edge
> {code:java}
>  edge.show();
> {code}
>  
> Then I perform a map transformation on edge that upper cases the values of 
> the two entries and return the result in edge2
> {code:java}
>  Dataset edge2 = edge.map(new MyFunction2(), edgeEncoder);{code}
> The following is the code of MyFunction2
> {code:java}
> public class MyFunction2 implements MapFunction, scala.Serializable 
> { 
> private static final long serialVersionUID = 1L;
> @Override public Row call(Row v1) throws Exception { 
> String el1 = v1.get(0).toString().toUpperCase(); 
> String el2 = v1.get(1).toString().toUpperCase(); 
> return RowFactory.create(el1,el2); 
> }
> }{code}
> Finally I show the content of edge2
> {code:java}
> edge2.show();
> {code}
> I can confirm that, checking on the hadoop UI a localhost:8088, the job is 
> submitted correctly, and
> what sounds strange is that the first show is returned correctly in my 
> console, but the second one fails returning the up mentioned error.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36917) java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in i

2021-10-04 Thread Davide Benedetto (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423867#comment-17423867
 ] 

Davide Benedetto commented on SPARK-36917:
--

[~srowen] Can You Please help me with this issue?

> java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD
> ---
>
> Key: SPARK-36917
> URL: https://issues.apache.org/jira/browse/SPARK-36917
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Submit
>Affects Versions: 3.1.2
> Environment: Ubuntu 20
> Spark3.1.2-hadoop3.2
> Hadoop 3.1
>Reporter: Davide Benedetto
>Priority: Major
>  Labels: hadoop, java, serializable, spark, spark-conf, ubuntu
>
> My spark Job fails with this error:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3) (davben-lubuntu executor 2): java.lang.ClassCastException: cannot 
> assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD
> My OS Linux Ubuntu 20 is in this way organized: I have two user: /home/davben 
> and /home/hadoop. Into hadoop user I have installed hadoop 3.1 and 
> spark-3.1.2-hadoop3.2.  Both users refers to java-8-openjdk Java 
> installation. The Spark job is launched from user davben on eclipse IDE  in 
> this way:
> I create the spark conf and the spark session
>  
> {code:java}
> System.setProperty("hadoop.home.dir", "/home/hadoop/hadoop");
> SparkConf sparkConf = new SparkConf()
> .setAppName("simple")
> .setMaster("yarn")
> .set("spark.executor.memory", "1g")
> .set("deploy.mode", "cluster")
> .set("spark.yarn.stagingDir", "hdfs://localhost:9000/user/hadoop/") 
> .set("spark.hadoop.fs.defaultFS","hdfs://localhost:9000") 
> .set("spark.hadoop.yarn.resourcemanager.hostname","localhost") 
> .set("spark.hadoop.yarn.resourcemanager.scheduler.address","localhost:8030") 
> .set("spark.hadoop.yarn.resourcemanager.address ","localhost:8032") 
> .set("spark.hadoop.yarn.resourcemanager.webapp.address","localhost:8088") 
> .set("spark.hadoop.yarn.resourcemanager.admin.address","localhost:8083")
> SparkSession spark = 
> SparkSession.builder().config(sparkConf).getOrCreate();{code}
> Then I create a dataset with two entries:
>  
> {code:java}
> List rows = new ArrayList<>(); 
> rows.add(RowFactory.create("a", "b"));
> rows.add(RowFactory.create("a", "a"));
> StructType structType = new StructType(); 
> structType = structType.add("edge_1", DataTypes.StringType, false);
> structType = structType.add("edge_2", DataTypes.StringType, false); 
> ExpressionEncoder edgeEncoder = RowEncoder.apply(structType);
> Dataset edge = spark.createDataset(rows, edgeEncoder);
> {code}
> Then I print the content of the current dataset edge
> {code:java}
>  edge.show();
> {code}
>  
> Then I perform a map transformation on edge that upper cases the values of 
> the two entries and return the result in edge2
> {code:java}
>  Dataset edge2 = edge.map(new MyFunction2(), edgeEncoder);{code}
> The following is the code of MyFunction2
> {code:java}
> public class MyFunction2 implements MapFunction, scala.Serializable 
> { 
> private static final long serialVersionUID = 1L;
> @Override public Row call(Row v1) throws Exception { 
> String el1 = v1.get(0).toString().toUpperCase(); 
> String el2 = v1.get(1).toString().toUpperCase(); 
> return RowFactory.create(el1,el2); 
> }
> }{code}
> Finally I show the content of edge2
> {code:java}
> edge2.show();
> {code}
> I can confirm that, checking on the hadoop UI a localhost:8088, the job is 
> submitted correctly, and
> what sounds strange is that the first show is returned correctly in my 
> console, but the second one fails returning the up mentioned error.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18075) UDF doesn't work on non-local spark

2021-10-03 Thread Davide Benedetto (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-18075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423740#comment-17423740
 ] 

Davide Benedetto commented on SPARK-18075:
--

[~cloud_fan] Are you now able to run the job with yarn from IDE?

> UDF doesn't work on non-local spark
> ---
>
> Key: SPARK-18075
> URL: https://issues.apache.org/jira/browse/SPARK-18075
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Nick Orka
>Priority: Major
>
> I have the issue with Spark 2.0.0 (spark-2.0.0-bin-hadoop2.7.tar.gz)
> According to this ticket https://issues.apache.org/jira/browse/SPARK-9219 
> I've made all spark dependancies with PROVIDED scope. I use 100% same 
> versions of spark in the app as well as for spark server. 
> Here is my pom:
> {code:title=pom.xml}
> 
> 1.6
> 1.6
> UTF-8
> 2.11.8
> 2.0.0
> 2.7.0
> 
> 
> 
> 
> org.apache.spark
> spark-core_2.11
> ${spark.version}
> provided
> 
> 
> org.apache.spark
> spark-sql_2.11
> ${spark.version}
> provided
> 
> 
> org.apache.spark
> spark-hive_2.11
> ${spark.version}
> provided
> 
> 
> {code}
> As you can see all spark dependencies have provided scope
> And this is a code for reproduction:
> {code:title=udfTest.scala}
> import org.apache.spark.sql.types.{StringType, StructField, StructType}
> import org.apache.spark.sql.{Row, SparkSession}
> /**
>   * Created by nborunov on 10/19/16.
>   */
> object udfTest {
>   class Seq extends Serializable {
> var i = 0
> def getVal: Int = {
>   i = i + 1
>   i
> }
>   }
>   def main(args: Array[String]) {
> val spark = SparkSession
>   .builder()
> .master("spark://nborunov-mbp.local:7077")
> //  .master("local")
>   .getOrCreate()
> val rdd = spark.sparkContext.parallelize(Seq(Row("one"), Row("two")))
> val schema = StructType(Array(StructField("name", StringType)))
> val df = spark.createDataFrame(rdd, schema)
> df.show()
> spark.udf.register("func", (name: String) => name.toUpperCase)
> import org.apache.spark.sql.functions.expr
> val newDf = df.withColumn("upperName", expr("func(name)"))
> newDf.show()
> val seq = new Seq
> spark.udf.register("seq", () => seq.getVal)
> val seqDf = df.withColumn("id", expr("seq()"))
> seqDf.show()
> df.createOrReplaceTempView("df")
> spark.sql("select *, seq() as sql_id from df").show()
>   }
> }
> {code}
> When .master("local") - everything works fine. When 
> .master("spark://...:7077"), it fails on line:
> {code}
> newDf.show()
> {code}
> The error is exactly the same:
> {code}
> scala> udfTest.main(Array())
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/Users/nborunov/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/Users/nborunov/.m2/repository/ch/qos/logback/logback-classic/1.1.7/logback-classic-1.1.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 16/10/19 19:37:52 INFO SparkContext: Running Spark version 2.0.0
> 16/10/19 19:37:52 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/10/19 19:37:52 INFO SecurityManager: Changing view acls to: nborunov
> 16/10/19 19:37:52 INFO SecurityManager: Changing modify acls to: nborunov
> 16/10/19 19:37:52 INFO SecurityManager: Changing view acls groups to: 
> 16/10/19 19:37:52 INFO SecurityManager: Changing modify acls groups to: 
> 16/10/19 19:37:52 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(nborunov); 
> groups with view permissions: Set(); users  with modify permissions: 
> Set(nborunov); groups with modify permissions: Set()
> 16/10/19 19:37:53 INFO Utils: Successfully started service 'sparkDriver' on 
> port 57828.
> 16/10/19 19:37:53 INFO SparkEnv: Registering MapOutputTracker
> 16/10/19 19:37:53 INFO SparkEnv: Registering BlockManagerMaster
> 16/10/19 19:37:53 INFO DiskBlockManager: Created local directory at 
> /private/var/folders/hl/2fv6555n2w92272zywwvpbzhgq/T/blockmgr-f2d05423-b7f7-4525-b41e-10dfe2f88264
> 16/10/19 19:37:53 INFO MemoryStore: MemoryStore started with capacity 2004.6 
> MB
> 16/10/19 19:37:53 INFO SparkEnv: Registering OutputCommitCoordinator
> 16/

[jira] [Created] (SPARK-36917) java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in ins

2021-10-03 Thread Davide Benedetto (Jira)
Davide Benedetto created SPARK-36917:


 Summary: java.lang.ClassCastException: cannot assign instance of 
java.lang.invoke.SerializedLambda to field 
org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of 
org.apache.spark.rdd.MapPartitionsRDD
 Key: SPARK-36917
 URL: https://issues.apache.org/jira/browse/SPARK-36917
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Spark Submit
Affects Versions: 3.1.2
 Environment: Ubuntu 20
Spark3.1.2-hadoop3.2
Hadoop 3.1
Reporter: Davide Benedetto


My spark Job fails with this error:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3) (davben-lubuntu executor 2): java.lang.ClassCastException: cannot assign 
instance of java.lang.invoke.SerializedLambda to field 
org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of 
org.apache.spark.rdd.MapPartitionsRDD

My OS Linux Ubuntu 20 is in this way organized: I have two user: /home/davben 
and /home/hadoop. Into hadoop user I have installed hadoop 3.1 and 
spark-3.1.2-hadoop3.2.  Both users refers to java-8-openjdk Java installation. 
The Spark job is launched from user davben on eclipse IDE  in this way:
I create the spark conf and the spark session

 
{code:java}
System.setProperty("hadoop.home.dir", "/home/hadoop/hadoop");
SparkConf sparkConf = new SparkConf()
.setAppName("simple")
.setMaster("yarn")
.set("spark.executor.memory", "1g")
.set("deploy.mode", "cluster")
.set("spark.yarn.stagingDir", "hdfs://localhost:9000/user/hadoop/") 
.set("spark.hadoop.fs.defaultFS","hdfs://localhost:9000") 
.set("spark.hadoop.yarn.resourcemanager.hostname","localhost") 
.set("spark.hadoop.yarn.resourcemanager.scheduler.address","localhost:8030") 
.set("spark.hadoop.yarn.resourcemanager.address ","localhost:8032") 
.set("spark.hadoop.yarn.resourcemanager.webapp.address","localhost:8088") 
.set("spark.hadoop.yarn.resourcemanager.admin.address","localhost:8083")
SparkSession spark = 
SparkSession.builder().config(sparkConf).getOrCreate();{code}
Then I create a dataset with two entries:

 
{code:java}
List rows = new ArrayList<>(); 
rows.add(RowFactory.create("a", "b"));
rows.add(RowFactory.create("a", "a"));
StructType structType = new StructType(); 
structType = structType.add("edge_1", DataTypes.StringType, false);
structType = structType.add("edge_2", DataTypes.StringType, false); 
ExpressionEncoder edgeEncoder = RowEncoder.apply(structType);
Dataset edge = spark.createDataset(rows, edgeEncoder);
{code}
Then I print the content of the current dataset edge
{code:java}
 edge.show();
{code}
 

Then I perform a map transformation on edge that upper cases the values of the 
two entries and return the result in edge2
{code:java}
 Dataset edge2 = edge.map(new MyFunction2(), edgeEncoder);{code}
The following is the code of MyFunction2
{code:java}
public class MyFunction2 implements MapFunction, scala.Serializable { 
private static final long serialVersionUID = 1L;

@Override public Row call(Row v1) throws Exception { 
String el1 = v1.get(0).toString().toUpperCase(); 
String el2 = v1.get(1).toString().toUpperCase(); 
return RowFactory.create(el1,el2); 
}
}{code}
Finally I show the content of edge2
{code:java}
edge2.show();
{code}
I can confirm that, checking on the hadoop UI a localhost:8088, the job is 
submitted correctly, and
what sounds strange is that the first show is returned correctly in my console, 
but the second one fails returning the up mentioned error.

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36875) 21/09/28 11:18:51 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resourc

2021-09-28 Thread Davide Benedetto (Jira)
Davide Benedetto created SPARK-36875:


 Summary: 21/09/28 11:18:51 WARN YarnScheduler: Initial job has not 
accepted any resources; check your cluster UI to ensure that workers are 
registered and have sufficient resources
 Key: SPARK-36875
 URL: https://issues.apache.org/jira/browse/SPARK-36875
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Spark Submit
Affects Versions: 3.1.2
 Environment: Eclipse
Hadoop 3.3

Spark3.1.2-hadoop3.2

Dependencies

 
{noformat}

org.apache.spark
  
spark-core_2.12 
3.1.2
  

 org.apache.spark 
spark-sql_2.12 
3.1.2  

 janino 
org.codehaus.janino 
   
 org.codehaus.janino 
janino 
3.0.8
 
  org.apache.spark 
spark-yarn_2.12
 3.1.2 provided
 
  
org.scala-lang
 scala-library 
2.12.13
 

Enviroment Variables set in eclipse: SPARK_HOME path/to/my/sparkfolder

OS Linux with UBUNTU 20
The test is launched on my first user davben. 
Spark folder and hadoop are on my second user hadoop{noformat}
 

 
Reporter: Davide Benedetto
 Fix For: 3.1.2


Hi, I am running a spark job with yarn programmatically using Eclipse IDE. 
Here I

1: open the spark session passing a SparkConf as input parameter,

 
{quote} 
{code:java}
System.setProperty("hadoop.home.dir", "/home/hadoop/hadoop"); 
System.setProperty("hadoop.home.dir", "/home/hadoop/hadoop");        
System.setProperty("SPARK_YARN_MODE", "yarn");        
System.setProperty("HADOOP_USER_NAME", "hadoop");
SparkConf sparkConf = new 
SparkConf().setAppName("simpleTest2").setMaster("yarn") 
.set("spark.executor.memory", "1g")
.set("deploy.mode", "cluster")
.set("spark.yarn.stagingDir", "hdfs://localhost:9000/user/hadoop/")
.set("spark.yarn.am.memory", "512m") 
.set("spark.dynamicAllocation.minExecutors","1") 
.set("spark.dynamicAllocation.maxExecutors","40") 
.set("spark.dynamicAllocation.initialExecutors","2")         
.set("spark.shuffle.service.enabled", "true")         
.set("spark.dynamicAllocation.enabled", "false")
.set("spark.cores.max", "1")
 .set("spark.yarn.executor.memoryOverhead", "500m")         
.set("spark.executor.instances","2")
.set("spark.executor.memory","500m")
.set("spark.num.executors","2")
.set("spark.executor.cores","1")
.set("spark.worker.instances","1")
.set("spark.worker.memory","512m")
.set("spark.worker.max.heapsize","512m")
.set("spark.worker.cores","1")
.set("maximizeResourceAllocation", "true") 
.set("spark.yarn.nodemanager.resource.cpu-vcores","4") 
.set("spark.yarn.submit.file.replication", "1")
SparkSession spark = SparkSession.builder().config(sparkConf).getOrCreate(); 
{code}
 
{quote}
2: Create a dataset of two Rows and i Show them
{code:java}
List rows = new ArrayList<>(); List rows = new ArrayList<>(); 
rows.add(RowFactory.create("a", "b")); rows.add(RowFactory.create("b", "c")); 
rows.add(RowFactory.create("a", "a"));
 StructType structType = new StructType(); structType = 
structType.add("edge_1", DataTypes.StringType, false); structType = 
structType.add("edge_2", DataTypes.StringType, false); ExpressionEncoder 
edgeEncoder = RowEncoder.apply(structType);
 Dataset edge = spark.createDataset(rows, edgeEncoder); edge.show();
 
{code}
{{From now it is all Ok, the job is submitted on hadoop and the rows are showed 
correctly}}

{{3: I perform a Map that upper cases the elements in the row}}

 
{quote}
{code:java}
Dataset edge2 = edge.map(new MyFunction2(), edgeEncoder); Dataset 
edge2 = edge.map(new MyFunction2(), edgeEncoder);{code}
{quote}
 
{quote}{{ public static class MyFunction2 implements MapFunction { 
public static class MyFunction2 implements MapFunction {}}
{{ /** *  */ private static final long serialVersionUID = 1L;}}
{{ @Override public Row call(Row v1) throws Exception \{ String el1 = 
v1.get(0).toString().toUpperCase(); String el2 = 
v1.get(1).toString().toUpperCase(); return RowFactory.create(el1,el2); }}}
{{ }}}
{quote}
{{4: Then I show the dataset after map is performed}}
{quote}
{code:java}
edge2.show();{code}
{quote}
{{And precisely here the log Loops saying }}

{{21/09/28 11:18:51 WARN YarnScheduler: Initial job has not accepted any 
resources; check your cluster UI to ensure that workers are registered and have 
sufficient resources}}

 
{quote}{{Here is the Full log}}

{{log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).log4j:WARN No appenders 
could be found for logger 
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).log4j:WARN Please 
initialize the log4j system properly.log4j:WARN See 
http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.Using 
Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties21/09/28 03:05:16 WARN Utils: Your 
hostname, davben-lubuntu resolves to a loopback address: 127.0.1.1; using 
192.168.1.36 instead (on interface wlo1)21/09/28 03:05:16 WARN Utils: Set 
SPARK_LOCAL_IP if you need to bind to ano

[jira] [Commented] (SPARK-17722) YarnScheduler: Initial job has not accepted any resources

2021-09-27 Thread Davide Benedetto (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-17722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420999#comment-17420999
 ] 

Davide Benedetto commented on SPARK-17722:
--

Hi Partha
I have your same issue. Which yarn configurations have you set?



> YarnScheduler: Initial job has not accepted any resources
> -
>
> Key: SPARK-17722
> URL: https://issues.apache.org/jira/browse/SPARK-17722
> Project: Spark
>  Issue Type: Bug
>Reporter: Partha Pratim Ghosh
>Priority: Major
>
> Connected spark in yarn mode from eclipse java. On trying to run task it is 
> giving the following - 
> YarnScheduler: Initial job has not accepted any resources; check your cluster 
> UI to ensure that workers are registered and have sufficient resources. The 
> request is going to Hadoop cluster scheduler and from there we can see the 
> job in spark UI. But there it is saying that no task has been assigned for 
> this.
> Same code is running from spark-submit where we need to remove the following 
> lines - 
> System.setProperty("java.security.krb5.conf", "C:\\xxx\\krb5.conf");
>   
>   org.apache.hadoop.conf.Configuration conf = new 
>   org.apache.hadoop.conf.Configuration();
>   conf.set("hadoop.security.authentication", "kerberos");
>   UserGroupInformation.setConfiguration(conf);
> Following is the configuration - 
> import org.apache.hadoop.security.UserGroupInformation;
> import org.apache.spark.SparkConf;
> import org.apache.spark.api.java.JavaRDD;
> import org.apache.spark.api.java.JavaSparkContext;
> import org.apache.spark.sql.DataFrame;
> import org.apache.spark.sql.SQLContext;
> public class TestConnectivity {
>   /**
>* @param args
>*/
>   public static void main(String[] args) {
>   System.setProperty("java.security.krb5.conf", 
> "C:\\xxx\\krb5.conf");
>   
>   org.apache.hadoop.conf.Configuration conf = new 
>   org.apache.hadoop.conf.Configuration();
>   conf.set("hadoop.security.authentication", "kerberos");
>   UserGroupInformation.setConfiguration(conf);
>SparkConf config = new SparkConf().setAppName("Test Spark ");
>config = config.setMaster("yarn-client");
>config .set("spark.dynamicAllocation.enabled", "false");
>config.set("spark.executor.memory", "2g");
>config.set("spark.executor.instances", "1");
>config.set("spark.executor.cores", "2");
>//config.set("spark.driver.memory", "2g");
>//config.set("spark.driver.cores", "1");
>/*config.set("spark.executor.am.memory", "2g");
>config.set("spark.executor.am.cores", "2");*/
>config.set("spark.cores.max", "4");
>config.set("yarn.nodemanager.resource.cpu-vcores","4");
>config.set("spark.yarn.queue","root.root");
>/*config.set("spark.deploy.defaultCores", "2");
>config.set("spark.task.cpus", "2");*/
>config.set("spark.yarn.jar", 
> "file:/C:/xxx/spark-assembly_2.10-1.6.0-cdh5.7.1.jar");
>   JavaSparkContext sc = new JavaSparkContext(config);
>   SQLContext sqlcontext = new SQLContext(sc);
>   DataFrame df = sqlcontext.jsonFile(logFile);
>   JavaRDD logData = 
> sc.textFile("sparkexamples/Employee.json").cache();
>  DataFrame df = sqlcontext.jsonRDD(logData);
>  
>   df.show();
>   df.printSchema();
>   
>   //UserGroupInformation.setConfiguration(conf);
>   
>   }
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org