from:"\"spark\""

Unusual bug,please help me,i can do nothing!!!

2022-03-30 Thread spark User

Hello, I am a spark user. I use the "spark-shell.cmd" startup command in windows cmd, the first startup is normal, when I use the "ctrl+c" command to force the end of the spark window, it can't start normally again. .The error message is as follows &q

error bug,please help me!!!

2022-03-20 Thread spark User

Hello, I am a spark user. I use the "spark-shell.cmd" startup command in windows cmd, the first startup is normal, when I use the "ctrl+c" command to force the end of the spark window, it can't start normally again. .The error message is as follows &q

Connection Reset by Peer : failed to remove cached rdd

2021-07-29 Thread Big data developer need help relat to spark gateway roles in 2 . 0

Hi Team , We are facing issue in production where we are getting frequent Still have 1 request outstanding when connection with the hostname was closed connection reset by peer : errors as well as warnings : failed to remove cache rdd or failed to remove broadcast variable. Please help us how to

Connection Reset by Peer : failed to remove cached rdd

2021-07-29 Thread Big data developer need help relat to spark gateway roles in 2 . 0

Hi Team , We are facing issue in production where we are getting frequent Still have 1 request outstanding when connection with the hostname was closed connection reset by peer : errors as well as warnings : failed to remove cache rdd or failed to remove broadcast variable. Please help us how to

Re: Application Timeout

2021-03-25 Thread Brett Spark

://twitter.com/jaceklaskowski > > <https://twitter.com/jaceklaskowski> > > > On Tue, Jan 19, 2021 at 11:27 PM Brett Spark > wrote: > >> Hello! >> When using Spark Standalone & Spark 2.4.4 / 3.0.0 - we are seeing our >> standalone Spark "applications&qu

Application Timeout

2021-01-19 Thread Brett Spark

Hello! When using Spark Standalone & Spark 2.4.4 / 3.0.0 - we are seeing our standalone Spark "applications" timeout and show as "Finished" after around an hour of time. Here is a screenshot from the Spark master before it's marked as finished. [image: image.png]

Spark stable release for Hadoop 3

2020-04-28 Thread Piper Spark

Hello, We are considering whether to use Hadoop or Kubernetes as the cluster manager for Spark. We would prefer to have Hadoop 3 because of its native support for scheduling GPUs. Although there is a Spark 3.0.0 pre-view2 version available that is pre-built for Hadoop 3, I would like to know

Re: writing into oracle database is very slow

2019-04-19 Thread spark receiver

uch on Oracle? How many partitions do you have on Oracle side? > > Am 06.04.2019 um 16:59 schrieb Lian Jiang : > > Hi, > > My spark job writes into oracle db using: > > df.coalesce(10).write.format("jdbc").option("url", url) > .option("driver&qu

Re: Hive to Oracle using Spark - Type(Date) conversion issue

2018-06-06 Thread spark receiver

usamy Thirupathy wrote: > > HI Jorn, > > Thanks for your sharing different options, yes we are trying to build a > generic tool for Hive to Spark export. > FYI, currently we are using sqoop, we are trying to migrate from sqoop to > spark. > > Thanks > -G > > On

Re: [Structured Streaming] More than 1 streaming in a code

2018-04-13 Thread spark receiver

- > ++ > |aver| > ++ > | 3.0| > ++ > > --- > Batch: 1 > --- > ++ > |aver| > ++ > | 4.0| > ++ > > > Updated Code - >

spark streaming kafka not displaying data in local eclipse

2018-01-16 Thread vr spark

Hi, I have a simple Java program to read data from kafka using spark streaming. When i run it from eclipse on my mac, it is connecting to the zookeeper, bootstrap nodes, But its not displaying any data. it does not give any error. it just shows 18/01/16 20:49:15 INFO Executor: Finished task

DataFrame joins with Spark-Java

2017-11-29 Thread sushma spark

Dear Friends, I am new to spark DataFrame. My requirement is i have a dataframe1 contains the today's records and dataframe2 contains yesterday's records. I need to compare the today's records with yesterday's records and find out new records which are not exists in the yeste

Re: Reload some static data during struct streaming

2017-11-13 Thread spark receiver

I need it cached to improve throughput ,only hope it can be refreshed once a day not every batch. > On Nov 13, 2017, at 4:49 PM, Burak Yavuz wrote: > > I think if you don't cache the jdbc table, then it should auto-refresh. > > On Mon, Nov 13, 2017 at 1:2

Reload some static data during struct streaming

2017-11-13 Thread spark receiver

Hi I’m using struct streaming(spark 2.2) to receive Kafka msg ,it works great. The thing is I need to join the Kafka message with a relative static table stored in mysql database (let’s call it metadata here). So is it possible to reload the metadata table after some time interval(like

Re: Driver hung and happend out of memory while writing to console progress bar

2017-02-13 Thread Spark User

How much memory have you allocated to the driver? Driver stores some state for tracking the task, stage and job history that you can see in the spark console, it does take up a significant portion of the heap, anywhere from 200MB - 1G, depending no your map reduce steps. Either way that is a good

Re: Question about best Spark tuning

2017-02-13 Thread Spark User

hence completing tasks quicker and let the spark scheduler (which is low cost and efficient based on my observation, it is never the bottleneck) do the work of distributing the work among the tasks. I have experimented with 1 task per core, 2-3 tasks per core and all the way up to 20+ tasks per core

Re: Is it better to Use Java or Python on Scala for Spark for using big data sets

2017-02-13 Thread Spark User

Spark has more support for scala, by that I mean more APIs are available for scala compared to python or Java. Also scala code will be more concise and easy to read. Java is very verbose. On Thu, Feb 9, 2017 at 10:21 PM, Irving Duran wrote: > I would say Java, since it will be somewhat simi

Re: Performance bug in UDAF?

2017-02-09 Thread Spark User

one has solved similar problem. Thanks, Bharath On Mon, Oct 31, 2016 at 11:40 AM, Spark User wrote: > Trying again. Hoping to find some help in figuring out the performance > bottleneck we are observing. > > Thanks, > Bharath > > On Sun, Oct 30, 2016 at 11:58 AM, Spark User >

Re: covert local tsv file to orc file on distributed cloud storage(openstack).

2016-11-24 Thread vr spark

Hi, The source file i have is on local machine and its pretty huge like 150 gb. How to go about it? On Sun, Nov 20, 2016 at 8:52 AM, Steve Loughran wrote: > > On 19 Nov 2016, at 17:21, vr spark wrote: > > Hi, > I am looking for scala or python code samples to covert local ts

Potential memory leak in yarn ApplicationMaster

2016-11-21 Thread Spark User

Hi All, It seems like the heap usage for org.apache.spark.deploy.yarn.ApplicationMaster keeps growing continuously. The driver crashes with OOM eventually. More details: I have a spark streaming app that runs on spark-2.0. The spark.driver.memory is 10G and spark.yarn.driver.memoryOverhead is

covert local tsv file to orc file on distributed cloud storage(openstack).

2016-11-19 Thread vr spark

Hi, I am looking for scala or python code samples to covert local tsv file to orc file and store on distributed cloud storage(openstack). So, need these 3 samples. Please suggest. 1. read tsv 2. convert to orc 3. store on distributed cloud storage thanks VR

Re: Performance bug in UDAF?

2016-10-31 Thread Spark User

Trying again. Hoping to find some help in figuring out the performance bottleneck we are observing. Thanks, Bharath On Sun, Oct 30, 2016 at 11:58 AM, Spark User wrote: > Hi All, > > I have a UDAF that seems to perform poorly when its input is skewed. I > have been debugg

Performance bug in UDAF?

2016-10-30 Thread Spark User

me goes down to 4 minutes. So I am trying to understand why is there such a big performance difference? What in UDAF causes the processing time to increase in orders of magnitude when there is a skew in the data as observed above? Any insight from spark developers, contributors, or anyone else who

RDD to Dataset results in fixed number of partitions

2016-10-21 Thread Spark User

Hi All, I'm trying to create a Dataset from RDD and do groupBy on the Dataset. The groupBy stage runs with 200 partitions. Although the RDD had 5000 partitions. I also seem to have no way to change that 200 partitions on the Dataset to some other large number. This seems to be affecting the parall

receiving stream data options

2016-10-12 Thread vr spark

Hi, I have a continuous rest api stream which keeps spitting out data in form of json. I access the stream using python requests.get(url, stream=True, headers=headers). I want to receive them using spark and do further processing. I am not sure which is best way to receive it in spark. What are

Question about single/multi-pass execution in Spark-2.0 dataset/dataframe

2016-09-27 Thread Spark User

tr3Counts = ds.groupBy('keyAttr', 'attr3').count() //similar counts for 20 attributes //code to merge attr1Counts and attr2Counts and attr3Counts //translate it to desired output format and save the result. Some more details: 1) The application is a spark streaming application

Re: spark-submit failing but job running from scala ide

2016-09-26 Thread vr spark

Hi Jacek/All, I restarted my terminal and then i try spark-submit and again getting those errors. How do i see how many "runtimes" are running and how to have only one? some how my spark 1.6 and spark 2.0 are conflicting. how to fix it? i installed spark 1.6 earlier using this

Running jobs against remote cluster from scala eclipse ide

2016-09-26 Thread vr spark

Hi, I use scala IDE for eclipse. I usually run job against my local spark installed on my mac and then export the jars and copy it to spark cluster of my company and run spark submit on it. This works fine. But i want to run the jobs from scala ide directly using the spark cluster of my company

Re: spark-submit failing but job running from scala ide

2016-09-25 Thread vr spark

yes, i have both spark 1.6 and spark 2.0. I unset the spark home environment variable and pointed spark submit to 2.0. Its working now. How do i uninstall/remove spark 1.6 from mac? Thanks On Sun, Sep 25, 2016 at 4:28 AM, Jacek Laskowski wrote: > Hi, > > Can you execute run-exampl

spark-submit failing but job running from scala ide

2016-09-24 Thread vr spark

Hi, I have this simple scala app which works fine when i run it as scala application from the scala IDE for eclipse. But when i export is as jar and run it from spark-submit i am getting below error. Please suggest *bin/spark-submit --class com.x.y.vr.spark.first.SimpleApp test.jar* 16/09/24 23

Re: Undefined function json_array_to_map

2016-08-17 Thread vr spark

Hi Ted/All, i did below to get fullstack and see below, not able to understand root cause.. except Exception as error: traceback.print_exc() and this what i get... File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 580, in sql return Data

Re: Attempting to accept an unknown offer

2016-08-17 Thread vr spark

pting to accept an unknown offer b859f2f3-7484-482d-8c0d-35bd91c1ad0a-O168676558. and many more lines like this on the screen with similar message On Wed, Aug 17, 2016 at 9:08 AM, Ted Yu wrote: > Please include user@ in your reply. > > Can you reveal the snippet of hive sql

Attempting to accept an unknown offer

2016-08-17 Thread vr spark

W0816 23:17:01.984846 16360 sched.cpp:1195] Attempting to accept an unknown offer b859f2f3-7484-482d-8c0d-35bd91c1ad0a-O162910492 W0816 23:17:01.984987 16360 sched.cpp:1195] Attempting to accept an unknown offer b859f2f3-7484-482d-8c0d-35bd91c1ad0a-O162910493 W0816 23:17:01.985124 16360 sched.cpp

Undefined function json_array_to_map

2016-08-17 Thread vr spark

Hi, I am getting error on below scenario. Please suggest. i have a virtual view in hive view name log_data it has 2 columns query_map map parti_date int Here is my snippet for the spark data frame my dataframe res=sqlcont.sql("select parti_date FROM log_data

Re: dataframe row list question

2016-08-12 Thread vr spark

Hi Experts, Please suggest On Thu, Aug 11, 2016 at 7:54 AM, vr spark wrote: > > I have data which is json in this format > > myList: array > |||-- elem: struct > ||||-- nm: string (nullable = true) > ||||-- vList: a

dataframe row list question

2016-08-11 Thread vr spark

I have data which is json in this format myList: array |||-- elem: struct ||||-- nm: string (nullable = true) ||||-- vList: array (nullable = true) |||||-- element: string (containsNull = true) from my kafka stream, i created a dataframe usin

Spark SQL -JDBC connectivity

2016-08-09 Thread Soni spark

Hi, I would to know the steps to connect SPARK SQL from spring framework (Web-UI). also how to run and deploy the web application?

Re: read only specific jsons

2016-07-27 Thread vr spark

x27; On Tue, Jul 26, 2016 at 12:05 PM, Cody Koeninger wrote: > Have you tried filtering out corrupt records with something along the > lines of > > df.filter(df("_corrupt_record").isNull) > > On Tue, Jul 26, 2016 at 1:53 PM, vr spark wrote: > > i am readi

read only specific jsons

2016-07-26 Thread vr spark

i am reading data from kafka using spark streaming. I am reading json and creating dataframe. I am using pyspark kvs = KafkaUtils.createDirectStream(ssc, kafkaTopic1, kafkaParams) lines = kvs.map(lambda x: x[1]) lines.foreachRDD(mReport) def mReport(clickRDD): clickDF = sqlContext.jsonRDD

read only specific jsons

2016-07-26 Thread vr spark

i am reading data from kafka using spark streaming. I am reading json and creating dataframe. kvs = KafkaUtils.createDirectStream(ssc, kafkaTopic1, kafkaParams) lines = kvs.map(lambda x: x[1]) lines.foreachRDD(mReport) def mReport(clickRDD): clickDF = sqlContext.jsonRDD(clickRDD

Error in Word Count Program

2016-07-19 Thread RK Spark

val textFile = sc.textFile("README.md")val linesWithSpark = textFile.filter(line => line.contains("Spark")) linesWithSpark.saveAsTextFile("output1") Same error: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/user/spark-1.5.1-bin-hadoop2.4/bin/README.md

Input path does not exist error in giving input file for word count program

2016-07-15 Thread RK Spark

val count = inputfile.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_ + _); org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:

Re: Getting error in inputfile | inputFile

2016-07-15 Thread RK Spark

scala> val count = inputfile.flatMap(line => line.split((" ").map(word => (word,1)).reduceByKey(_ + _) | | You typed two blank lines. Starting a new command. I am getting like how to solve this Regrads, Ramkrishna KT

Getting error in inputfile | inputFile

2016-07-14 Thread RK Spark

I am using Spark version is 1.5.1, I am getting errors in first program of spark,ie.e., word count. Please help me to solve this *scala> val inputfile = sc.textFile("input.txt")* *inputfile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[13] at textFile at :21* *scal

Unable to Run Spark Streaming Job in Hadoop YARN mode

2016-03-30 Thread Soni spark

Hi All, I am unable to run Spark Streaming job in my Hadoop Cluster, its behaving unexpectedly. When i submit a job, it fails by throwing some socket exception in HDFS, if i run the same job second or third time, it runs for sometime and stops. I am confused. Is there any configuration in YARN

Re: overriding spark.streaming.blockQueueSize default value

2016-03-29 Thread Spark Newbie

experiences. Thanks, On Mon, Mar 28, 2016 at 10:40 PM, Spark Newbie wrote: > Hi All, > > The default value for spark.streaming.blockQueueSize is 10 in > https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/receiver/BlockGenerator.scala. > I

overriding spark.streaming.blockQueueSize default value

2016-03-28 Thread Spark Newbie

Hi All, The default value for spark.streaming.blockQueueSize is 10 in https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/receiver/BlockGenerator.scala. In spark kinesis asl 1.4 the received Kinesis records are stored by calling addData on line 115

Issues facing while Running Spark Streaming Job in YARN cluster mode

2016-03-22 Thread Soni spark

Hi , I am able to run spark streaming job in local mode, when i try to run the same job in my YARN cluster, its throwing errors. Any help is appreciated in this regard Here are my Exception logs: Exception 1: java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to

How to Catch Spark Streaming Twitter Exception ( Written Java)

2016-03-14 Thread Soni spark

Dear All, I am facing problem with Spark Twitter Streaming code, When ever twitter4j throws exception, i am unable to catch that exception. Could anyone help me catching that exception. Here is Pseudo Code: SparkConf sparkConf = new SparkConf().setMaster("local[2]").setApp

Terminate Spark job in eclipse

2016-03-14 Thread Soni spark

Hi Friends, Anyone can help me about how to terminate the Spark job in eclipse using java code? Thanks Soniya

Spark Twitter streaming

2016-03-07 Thread Soni spark

Hallo friends, I need a urgent help. I am using spark streaming to get the tweets from twitter and loading the data into HDFS. I want to find out the tweet source whether it is from web or mobile web or facebook ..etc. could you please help me logic. Thanks Soniya

Re: spark job submisson on yarn-cluster mode failing

2016-01-21 Thread Soni spark

logs and see why the sparkcontext is being > shutdown? Similar discussion happened here previously. > http://apache-spark-user-list.1001560.n3.nabble.com/RECEIVED-SIGNAL-15-SIGTERM-td23668.html > > Thanks > Best Regards > > On Thu, Jan 21, 2016 at 5:11 PM, Soni spark >

spark job submisson on yarn-cluster mode failing

2016-01-21 Thread Soni spark

Hi Friends, I spark job is successfully running on local mode but failing on cluster mode. Below is the error message i am getting. anyone can help me. 16/01/21 16:38:07 INFO twitter4j.TwitterStreamImpl: Establishing connection. 16/01/21 16:38:07 INFO twitter.TwitterReceiver: Twitter receiver

Re: ClassNotFoundException when executing spark jobs in standalone/cluster mode on Spark 1.5.2

2015-12-29 Thread Prem Spark

you need make sure this class is accessible to all servers since its a cluster mode and drive can be on any of the worker nodes. On Fri, Dec 25, 2015 at 5:57 PM, Saiph Kappa wrote: > Hi, > > I'm submitting a spark job like this: > > ~/spark-1.5.2-bin-hadoop2.6/bin/

Can anyone explain Spark behavior for below? Kudos in Advance

2015-12-27 Thread Prem Spark

Scenario1: val z = sc.parallelize(List("12","23","345",""),2) z.aggregate("")((x,y) => math.min(x.length, y.length).toString, (x,y) => x + y) res143: String = 10 Scenario2: val z = sc.parallelize(List("12","23","","345"),2) z.aggregate("")((x,y) => math.min(x.length, y.length).toString, (x,y) => x

Re: why one of Stage is into Skipped section instead of Completed

2015-12-27 Thread Prem Spark

local dirs and Spark recognizes that, so rather than > re-computing, it will start from the following stage. So, this is a good > thing in that you’re not re-computing a stage. In your case, it looks like > there’s already the output of the userreqs RDD (reduceByKey) so it doesn’t > re

why one of Stage is into Skipped section instead of Completed

2015-12-25 Thread Prem Spark

Whats does the below Skipped Stage means. can anyone help in clarifying? I was expecting 3 stages to get Succeeded but only 2 of them getting completed while one is skipped. Status: SUCCEEDED Completed Stages: 2 Skipped Stages: 1 Scala REPL Code Used: accounts is a basic RDD contains

Unable to create hive table using HiveContext

2015-12-23 Thread Soni spark

Hi friends, I am trying to create hive table through spark with Java code in Eclipse using below code. HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc()); sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)"); but i am getting error

create hive table in Spark with Java code

2015-12-20 Thread Soni spark

Hi Friends, I have created a hive external table with partition. I want to alter the hive table partition through spark with java code. alter table table1 add if not exists partition(datetime='2015-12-01') location 'hdfs://localhost:54310/spark/twitter/datetime=2015-12-01/&#x

How do I link JavaEsSpark.saveToEs() to a sparkConf?

2015-12-14 Thread Spark Enthusiast

Folks, I have the following program : SparkConf conf = new SparkConf().setMaster("local").setAppName("Indexer").set("spark.driver.maxResultSize", "2g");conf.set("es.index.auto.create", "true");conf.set("es.nodes", "localhost");conf.set("es.port", "9200");conf.set("es.write.operation", "index");

epoch date time problem to load data into in spark

2015-12-08 Thread Soni spark

Hi Friends, I am written a spark streaming program in Java to access twitter tweets and it is working fine. I can able to copy the twitter feeds to HDFS location by batch wise.For each batch, it is creating a folder with epoch time stamp. for example, If i give HDFS location as *hdfs

Re: SparkException: Failed to get broadcast_10_piece0

2015-11-30 Thread Spark Newbie

Pinging again ... On Wed, Nov 25, 2015 at 4:19 PM, Ted Yu wrote: > Which Spark release are you using ? > > Please take a look at: > https://issues.apache.org/jira/browse/SPARK-5594 > > Cheers > > On Wed, Nov 25, 2015 at 3:59 PM, Spark Newbie > wrote: > >>

Re: Error in block pushing thread puts the KinesisReceiver in a stuck state

2015-11-30 Thread Spark Newbie

Pinging again to see if anyone has any thoughts or prior experience with this issue. On Wed, Nov 25, 2015 at 3:56 PM, Spark Newbie wrote: > Hi Spark users, > > I have been seeing this issue where receivers enter a "stuck" state after > it encounters a the following exc

Re: SparkException: Failed to get broadcast_10_piece0

2015-11-25 Thread Spark Newbie

Using Spark-1.4.1 On Wed, Nov 25, 2015 at 4:19 PM, Ted Yu wrote: > Which Spark release are you using ? > > Please take a look at: > https://issues.apache.org/jira/browse/SPARK-5594 > > Cheers > > On Wed, Nov 25, 2015 at 3:59 PM, Spark Newbie > wrote: > >>

SparkException: Failed to get broadcast_10_piece0

2015-11-25 Thread Spark Newbie

Hi Spark users, I'm seeing the below exceptions once in a while which causes tasks to fail (even after retries, so it is a non recoverable exception I think), hence stage fails and then the job gets aborted. Exception --- java.io.IOException: org.apache.spark.SparkException: Failed t

Error in block pushing thread puts the KinesisReceiver in a stuck state

2015-11-25 Thread Spark Newbie

Hi Spark users, I have been seeing this issue where receivers enter a "stuck" state after it encounters a the following exception "Error in block pushing thread - java.util.concurrent.TimeoutException: Futures timed out". I am running the application on spark-1.4.1 and u

Spark twitter streaming in Java

2015-11-18 Thread Soni spark

Dear Friends, I am struggling with spark twitter streaming. I am not getting any data. Please correct below code if you found any mistakes. import org.apache.spark.*; import org.apache.spark.api.java. function.*; import org.apache.spark.streaming.*; import org.apache.spark.streaming.api.java

Re: s3a file system and spark deployment mode

2015-10-15 Thread Spark Newbie

Are you using EMR? You can install Hadoop-2.6.0 along with Spark-1.5.1 in your EMR cluster. And that brings s3a jars to the worker nodes and it becomes available to your application. On Thu, Oct 15, 2015 at 11:04 AM, Scott Reynolds wrote: > List, > > Right now we build our spark jobs

Re: Spark 1.5 java.net.ConnectException: Connection refused

2015-10-15 Thread Spark Newbie

e Spark's configuration page). The job by default does not get > resubmitted. > > You could try getting the logs of the failed executor, to see what caused > the failure. Could be a memory limit issue, and YARN killing it somehow. > > > > On Wed, Oct 14, 2015 at 11:05

Re: Spark 1.5 java.net.ConnectException: Connection refused

2015-10-14 Thread Spark Newbie

regardless of whether they were successfully processed or not. On Wed, Oct 14, 2015 at 11:01 AM, Spark Newbie wrote: > I ran 2 different spark 1.5 clusters that have been running for more than > a day now. I do see jobs getting aborted due to task retry's maxing out > (default 4) d

Re: Spark 1.5 java.net.ConnectException: Connection refused

2015-10-14 Thread Spark Newbie

I ran 2 different spark 1.5 clusters that have been running for more than a day now. I do see jobs getting aborted due to task retry's maxing out (default 4) due to ConnectionException. It seems like the executors die and get restarted and I was unable to find the root cause (same app cod

Spark 1.5 java.net.ConnectException: Connection refused

2015-10-13 Thread Spark Newbie

Hi Spark users, I'm seeing the below exception in my spark streaming application. It happens in the first stage where the kinesis receivers receive records and perform a flatMap operation on the unioned Dstream. A coalesce step also happens as a part of that stage for optimizing the perfor

DEBUG level log in receivers and executors

2015-10-12 Thread Spark Newbie

Hi Spark users, Is there an easy way to turn on DEBUG logs in receivers and executors? Setting sparkContext.setLogLevel seems to turn on DEBUG level only on the Driver. Thanks,

Re: Spark checkpoint restore failure due to s3 consistency issue

2015-10-09 Thread Spark Newbie

logs? I can send it if that will help dig into the root cause. On Fri, Oct 9, 2015 at 2:18 PM, Tathagata Das wrote: > Can you provide the before stop and after restart log4j logs for this? > > On Fri, Oct 9, 2015 at 2:13 PM, Spark Newbie > wrote: > >> Hi Spark Users, &

Spark checkpoint restore failure due to s3 consistency issue

2015-10-09 Thread Spark Newbie

Hi Spark Users, I'm seeing checkpoint restore failures causing the application startup to fail with the below exception. When I do "ls" on the s3 path I see the key listed sometimes and not listed sometimes. There are no part files (checkpointed files) in the specified S3 path. T

Getting an error when trying to read a GZIPPED file

2015-09-02 Thread Spark Enthusiast

Folks, I have an input file which is gzipped. I use sc.textFile("foo.gz") when I see the following problem. Can someone help me how to fix this? 15/09/03 10:05:32 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id15/09/03 10:05:32 INFO CodecPool: Got brand-new decompress

Data Frame support CSV or excel format ?

2015-08-27 Thread spark user

Hi all , Can we create data frame from excels sheet or csv file , in below example It seems they support only json ? DataFrame df = sqlContext.read().json("examples/src/main/resources/people.json");

Spark

2015-08-24 Thread Spark Enthusiast

I was running a Spark Job to crunch a 9GB apache log file When I saw the following error: 15/08/25 04:25:16 WARN scheduler.TaskSetManager: Lost task 99.0 in stage 37.0 (TID 4115, ip-10-150-137-100.ap-southeast-1.compute.internal): ExecutorLostFailure (executor 29 lost)15/08/25 04:25:16 INFO

How to parse multiple event types using Kafka

2015-08-23 Thread Spark Enthusiast

Folks, I use the following Streaming API from KafkaUtils : public JavaPairInputDStream inputDStream() { HashSet topicsSet = new HashSet(Arrays.asList(topics.split(","))); HashMap kafkaParams = new HashMap(); kafkaParams.put(Tokens.KAFKA_BROKER_LIST_TOKEN.getRealTokenName(), brokers);

Re: How to automatically relaunch a Driver program after crashes?

2015-08-19 Thread Spark Enthusiast

Thanks for the reply. Are Standalone or Mesos the only options? Is there a way to auto relaunch if driver runs as a Hadoop Yarn Application? On Wednesday, 19 August 2015 12:49 PM, Todd wrote: There is an option for the spark-submit (Spark standalone or Mesos with cluster deploy

How to automatically relaunch a Driver program after crashes?

2015-08-18 Thread Spark Enthusiast

Folks, As I see, the Driver program is a single point of failure. Now, I have seen ways as to how to make it recover from failures on a restart (using Checkpointing) but I have not seen anything as to how to restart it automatically if it crashes. Will running the Driver as a Hadoop Yarn Applica

Re: Not seeing Log messages

2015-08-11 Thread Spark Enthusiast

Forgot to mention. Here is how I run the program : ./bin/spark-submit --conf "spark.app.master"="local[1]" ~/workspace/spark-python/ApacheLogWebServerAnalysis.py On Wednesday, 12 August 2015 10:28 AM, Spark Enthusiast wrote: I wrote a small python program

Not seeing Log messages

2015-08-11 Thread Spark Enthusiast

I wrote a small python program : def parseLogs(self): """ Read and parse log file """ self._logger.debug("Parselogs() start") self.parsed_logs = (self._sc .textFile(self._logFile) .map(self._parseApacheLogLine) .cac

How do I Process Streams that span multiple lines?

2015-08-03 Thread Spark Enthusiast

All examples of Spark Stream programming that I see assume streams of lines that are then tokenised and acted upon (like the WordCount example). How do I process Streams that span multiple lines? Are there examples that I can use?

Re: Spark 1.3.1 + Hive: write output to CSV with header on S3

2015-07-17 Thread spark user

t;; String password = ""; String url = "jdbc:hive2://quickstart.cloudera:1/default"; On Friday, July 17, 2015 2:29 AM, Roberto Coluccio wrote: Hello community, I'm currently using Spark 1.3.1 with Hive support for outputting processed data on an external H

Re: Java 8 vs Scala

2015-07-15 Thread spark user

I struggle lots in Scala , almost 10 days n0 improvement , but when i switch to Java 8 , things are so smooth , and I used Data Frame with Redshift and Hive all are looking good .if you are very good In Scala the go with Scala otherwise Java is best fit . This is just my openion because I am Ja

Data Frame for nested json

2015-07-14 Thread spark user

is DataFrame support nested json to dump directely to data base For simple json it working fine {"id":2,"name":"Gerald","email":"gbarn...@zimbio.com","city":"Štoky","country":"Czech Republic","ip":"92.158.154.75”}, But for nested json it failed to load root |-- rows: array (nullable = true)

Java 8 vs Scala

2015-07-14 Thread spark user

Hi All To Start new project in Spark , which technology is good .Java8 OR Scala . I am Java developer , Can i start with Java 8 or I Need to learn Scala . which one is better technology for quick start any POC project Thanks - su

SparkR dataFrame read.df fails to read from aws s3

2015-07-08 Thread Ben Spark

I have Spark 1.4 deployed on AWS EMR but methods of SparkR dataFrame read.df method cannot load data from aws s3. 1) "read.df" error message read.df(sqlContext,"s3://some-bucket/some.json","json") 15/07/09 04:07:01 ERROR r.RBackendHandler: loadDF on org.apache.s

Re: spark - redshift !!!

2015-07-08 Thread spark user

Hi 'I am looking how to load data in redshift .Thanks On Wednesday, July 8, 2015 12:47 AM, shahab wrote: Hi, I did some experiment with loading data from s3 into spark. I loaded data from s3 using sc.textFile(). Have a look at the following code snippet: val csv = sc.tex

spark - redshift !!!

2015-07-07 Thread spark user

Hi Can you help me how to load data from s3 bucket to redshift , if you gave sample code can you pls send me Thanks su

Can a Spark Driver Program be a REST Service by itself?

2015-07-01 Thread Spark Enthusiast

Folks, My Use case is as follows: My Driver program will be aggregating a bunch of Event Streams and acting on it. The Action on the aggregated events is configurable and can change dynamically. One way I can think of is to run the Spark Driver as a Service where a config push can be caught via

Can I do Joins across Event Streams ?

2015-07-01 Thread Spark Enthusiast

Hi, I have to build a system that reacts to a set of events. Each of these events are separate streams by themselves which are consumed from different Kafka Topics and hence will have different InputDStreams. Questions: Will I be able to do joins across multiple InputDStreams and collate the outp

Re: s3 bucket access/read file

2015-06-29 Thread spark user

.jets3t.service.S3ServiceException: S3 HEAD request failed for '/user%2Fdidi' - ResponseCode=400, ResponseMessage=Bad Request what does the user has to do here??? i am using key & secret !!! How can i simply create RDD from text file on S3 Thanks Didi -- View this message in

Serialization Exception

2015-06-29 Thread Spark Enthusiast

ble at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:1

Re: Scala/Python or Java

2015-06-25 Thread spark user

Spark is based on Scala and it written in Scala .To debug and fix issue i guess learning Scala is good for long term ? any advise ? On Thursday, June 25, 2015 1:26 PM, ayan guha wrote: I am a python fan so I use python. But what I noticed some features are typically 1-2 release

Scala/Python or Java

2015-06-25 Thread spark user

Hi All , I am new for spark , i just want to know which technology is good/best for spark learning ? 1) Scala 2) Java 3) Python I know spark support all 3 languages , but which one is best ? Thanks su

Re: Spark or Storm

2015-06-17 Thread Spark Enthusiast

Again, by Storm, you mean Storm Trident, correct? On Wednesday, 17 June 2015 10:09 PM, Michael Segel wrote: Actually the reverse. Spark Streaming is really a micro batch system where the smallest window is 1/2 a second (500ms). So for CEP, its not really a good idea. So in terms

Re: Spark or Storm

2015-06-17 Thread Spark Enthusiast

5 11:57 AM, Enno Shioji wrote: We've evaluated Spark Streaming vs. Storm and ended up sticking with Storm. Some of the important draw backs are: Spark has no back pressure (receiver rate limit can alleviate this to a certain point, but it's far from ideal)There is also no ex

Re: Spark or Storm

2015-06-16 Thread Spark Enthusiast

eventUpstream services ---> KAFKA -> event Stream Processor > Complex Event Processor > Elastic Search. >From what I understand, Storm will make a very good ESP and Spark Streaming >will make a good CEP. But, we are also eva

1 2 >

1 - 100 of 103 matches

Mail list logo