Re: OutOfDirectMemoryError for Spark 2.2

2018-03-07 Thread Chawla,Sumit
Hi Anybody got any pointers on this one? Regards Sumit Chawla On Tue, Mar 6, 2018 at 8:58 AM, Chawla,Sumit wrote: > No, This is the only Stack trace i get. I have tried DEBUG but didn't > notice much of a log change. > > Yes, I have tried bumping

is there a way to catch exceptions on executor level

2018-03-07 Thread Chethan Bhawarlal
Hi Dev, I am doing spark operations on Rdd level for each row like this, private def obj(row: org.apache.spark.sql.Row): Put = { row.schema.fields.foreach(x => { x.dataType match { case (StringType)=> //some operation so, when I get some empty or garbage value my

Spark Streaming logging on Yarn : issue with rolling in yarn-client mode for driver log

2018-03-07 Thread chandan prakash
Hi All, I am running my spark streaming in yarn-client mode. I want to enable rolling and aggregation in node manager container. I am using configs as suggested in spark doc :

Re: Spark StreamingContext Question

2018-03-07 Thread रविशंकर नायर
Got it, thanks On Wed, Mar 7, 2018 at 4:32 AM, Gerard Maas wrote: > Hi, > > You can run as many jobs in your cluster as you want, provided you have > enough capacity. > The one streaming context constrain is per job. > > You can submit several jobs for Flume and some

Re: Reading kafka and save to parquet problem

2018-03-07 Thread Junfeng Chen
I have ever tried to use readStream and writeStream, but it throw "Uri without authority: hdfs:/data/_spark_metadata" exception, which is not seen in normal read mode. The parquet path I specified is hdfs:///data Regard, Junfeng Chen On Thu, Mar 8, 2018 at 9:38 AM, naresh Goud

Re: Reading kafka and save to parquet problem

2018-03-07 Thread naresh Goud
change it to readStream instead of read as below val df = spark .readStream .format("kafka") .option("kafka.bootstrap.servers", "host1:port1,host2:port2") .option("subscribe", "topic1") .load() Check is this helpful

Reading kafka and save to parquet problem

2018-03-07 Thread Junfeng Chen
I am struggling in trying to read data in kafka and save them to parquet file on hdfs by using spark streaming according to this post https://stackoverflow.com/questions/45827664/read-from-kafka-and-write-to-hdfs-in-parquet My code is similar to following val df = spark .read

Re: dependencies conflict in oozie spark action for spark 2

2018-03-07 Thread Lian Jiang
I found below inconsistency between oozie and spark2 jars: jackson-core-2.4.4.jar oozie jackson-core-2.6.5.jar spark2 jackson-databind-2.4.4.jar oozie jackson-databind-2.6.5.jar spark2 jackson-annotations-2.4.0.jar oozie jackson-annotations-2.6.5.jar spark2 I removed the lower version jars

Issues with large schema tables

2018-03-07 Thread Ballas, Ryan W
Hello All, Our team is having a lot of issues with the Spark API particularly with large schema tables. We currently have a program written in Scala that utilizes the Apache spark API to create two tables from raw files. We have one particularly very large raw data file that contains around

Spark-submit Py-files with EMR add step?

2018-03-07 Thread Afshin, Bardia
I’m writing this email to reach out to the community to demisty the py-files parameter when working with spark-submit and python projects. Currently I have a project, say Src/ * Main.py * Modules/module1.py When I zip up the src directory and submit it to spark via emr add step , the

Re: what is the right syntax for self joins in Spark 2.3.0 ?

2018-03-07 Thread kant kodali
It looks to me that the StateStore described in this doc Actually has full outer join and every other join is a filter of that. Also the doc talks about update mode but looks like Spark 2.3 ended up with append

Re: Do values adjacent to exploded columns get duplicated?

2018-03-07 Thread Anshul Sachdeva
All the columns except exploded column will be duplicated after explode. As it joins all the value of exploded column list with other columns. Hope it clears. Regards Ansh On Mar 7, 2018 4:54 PM, "Vitaliy Pisarev" wrote: > This is a fairly basic question but I

Do values adjacent to exploded columns get duplicated?

2018-03-07 Thread Vitaliy Pisarev
This is a fairly basic question but I did not find an answer to it anywhere online: Suppose I have the following data frame (a and b are column names): a | b --- 1 |[x1,x2,x3,x4] # this is an array Now I explode column b and logically get: a | b

Thrift server - ODBC

2018-03-07 Thread Paulo Maia da Costa Ribeiro
Hello, I have Spark 2.2 installed but not Hive and I would like to expose Spark tables through ODBC. I am able to start thrift server , with apparently no errors and my ODBC driver is able to connect to thrift sever, but can’t see any Spark tables. Do I need to have Hive installed in order to

Re: Spark StreamingContext Question

2018-03-07 Thread Gerard Maas
Hi, You can run as many jobs in your cluster as you want, provided you have enough capacity. The one streaming context constrain is per job. You can submit several jobs for Flume and some other for Twitter, Kafka, etc... If you are getting started with Streaming with Spark, I'd recommend you to

Re: CachedKafkaConsumer: CachedKafkaConsumer is not running in UninterruptibleThread warning

2018-03-07 Thread Tathagata Das
These issues have likely been solved in future versions. Please use the latest release - Spark 2.3.0. On Tue, Mar 6, 2018 at 5:11 PM, Junfeng Chen wrote: > Spark 2.1.1. > > Actually it is a warning rather than an exception, so there is no stack > trace. Just many this line:

Re: Spark StreamingContext Question

2018-03-07 Thread sagar grover
Hi, You can have multiple streams under same streaming context and process them accordingly. With regards, Sagar Grover Phone - 7022175584 On Wed, Mar 7, 2018 at 9:26 AM, ☼ R Nair (रविशंकर नायर) < ravishankar.n...@gmail.com> wrote: > Hi all, > > Understand from documentation that, only one