Re: Spark Job Fails with Unknown Error writing to S3 from AWS EMR

2020-07-22 Thread Shriraj Bhardwaj
We faced this similar situation with jre 8u262 try reverting back... On Thu, Jul 23, 2020, 5:18 AM koti reddy wrote: > Hi, > > Can someone help to resolve this issue? > Thank you in advance. > > Error logs : > > java.io.EOFException: Unexpected EOF while trying to read response from server >

Spark Job Fails with Unknown Error writing to S3 from AWS EMR

2020-07-22 Thread koti reddy
Hi, Can someone help to resolve this issue? Thank you in advance. Error logs : java.io.EOFException: Unexpected EOF while trying to read response from server at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:402) at

Re: Spark DataFrame Creation

2020-07-22 Thread Andrew Melo
Hi Mark, On Wed, Jul 22, 2020 at 4:49 PM Mark Bidewell wrote: > > Sorry if this is the wrong place for this. I am trying to debug an issue > with this library: > https://github.com/springml/spark-sftp > > When I attempt to create a dataframe: > > spark.read. >

Re: Spark DataFrame Creation

2020-07-22 Thread Sean Owen
You'd probably do best to ask that project, but scanning the source code, that looks like it's how it's meant to work. It downloads to a temp file on the driver then copies to distributed storage then returns a DataFrame for that. I can't see how it would be implemented directly over sftp as there

Spark DataFrame Creation

2020-07-22 Thread Mark Bidewell
Sorry if this is the wrong place for this. I am trying to debug an issue with this library: https://github.com/springml/spark-sftp When I attempt to create a dataframe: spark.read. format("com.springml.spark.sftp"). option("host", "..."). option("username",

How to optimize the configuration and/or code to solve the cache overloading issue?

2020-07-22 Thread Yong Yuan
I met a trouble in using spark structured streaming. The usercache is continuously consumed due to the join operation without releasing. How can I optimize the configuration and/or code to solve this problem? Spark Cluster in AWS EMR. 1 master node, m4.xlarge, 4 core, 16GB 2 core nodes,

Spark 3 connect to Hive 1.2

2020-07-22 Thread Ashika Umanga
Greetings, Our standalone Spark 3 cluster is trying to connect to Hadoop 2.6 cluster running Hive server 1.2 (/usr/hdp/2.6.2.0-205/hive/lib/hive-service-1.2.1000.2.6.2.0-205.jar) import org.apache.spark.sql.functions._ import java.sql.Timestamp val df1 = spark.createDataFrame( Seq(