date:20170728

Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-28 Thread jeff saremi

asking this on a tangent: Is there anyway for the shuffle data to be replicated to more than one server? thanks From: jeff saremi Sent: Friday, July 28, 2017 4:38:08 PM To: Juan Rodríguez Hortalá Cc: user@spark.apache.org Subject: Re: Job keeps aborting because

Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-28 Thread jeff saremi

Thanks Juan for taking the time Here's more info: - This is running on Yarn in Master mode - See config params below - This is a corporate environment. In general nodes should not be added or removed that often to the cluster. Even if that is the case I would expect that to be one or 2 servers

Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-28 Thread Juan Rodríguez Hortalá

Hi Jeff, Can you provide more information about how are you running your job? In particular: - which cluster manager are you using? It is YARN, Mesos, Spark Standalone? - with configuration options are you using to submit the job? In particular are you using dynamic allocation or external shuf

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-28 Thread Priyank Shrivastava

Also, in your example doesn't the tempview need to be accessed using the same sparkSession on the scala side? Since I am not using a notebook, how can I get access to the same sparksession in scala. On Fri, Jul 28, 2017 at 3:17 PM, Priyank Shrivastava wrote: > Thanks Burak. > > In a streaming c

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-28 Thread Priyank Shrivastava

Thanks Burak. In a streaming context would I need to do any state management for the temp views? for example across sliding windows. Priyank On Fri, Jul 28, 2017 at 3:13 PM, Burak Yavuz wrote: > Hi Priyank, > > You may register them as temporary tables to use across language > boundaries. > >

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-28 Thread Burak Yavuz

Hi Priyank, You may register them as temporary tables to use across language boundaries. Python: df = spark.readStream... # Python logic df.createOrReplaceTempView("tmp1") Scala: val df = spark.table("tmp1") df.writeStream .foreach(...) On Fri, Jul 28, 2017 at 3:06 PM, Priyank Shrivastava w

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-28 Thread Priyank Shrivastava

TD, For a hybrid python-scala approach, what's the recommended way of handing off a dataframe from python to scala. I would like to know especially in a streaming context. I am not using notebooks/databricks. We are running it on our own spark 2.1 cluster. Priyank On Wed, Jul 26, 2017 at 12:4

can I do spark-submit --jars [s3://bucket/folder/jar_file]? or --jars

2017-07-28 Thread Richard Xin

Can we add extra library (jars on S3) to spark-submit? if yes, how? such as --jars, extraClassPath, extraLibPathThanks,Richard

Persisting RDD: Low Percentage with a lot of memory available

2017-07-28 Thread pedroT

Hi, This problem is very annoying for me and I'm tired of surfing the network without any good advice to follow. I have a complex job. It has been worked fine until I needed to save partial results (RDDs) to files. So I tried to cache the RDDs and then call a saveAsText method and follow the workf

RE: changing directories in Spark Streming

2017-07-28 Thread Siddhartha Singh Sandhu

Hi, I am saving the output of my streaming process to s3. I want to able to change the directory of the stream as an hour passes by. Will this work: parsed_kf_frame.saveAsTextFiles((s3_location).format( datetime.datetime.today().strftime("%Y%m%d"), datetime.datetime.today().strftime("%H

Re: How to configure spark on Yarn cluster

2017-07-28 Thread yohann jardin

For yarn, I'm speaking about the file fairscheduler.xml (if you kept the default scheduling of Yarn): https://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Allocation_file_format Yohann Jardin Le 7/28/2017 à 8:00 PM, jeff saremi a écrit : The only relevant sett

Re: How to configure spark on Yarn cluster

2017-07-28 Thread jeff saremi

The only relevant setting i see in Yarn is this: yarn.nodemanager.resource.memory-mb 120726 which is 120GB and we are well below that. I don't see a total limit. I haven't played with spark.memory.fraction. I'm not sure if it makes a difference. Note that there are no errors coming

Re: How to configure spark on Yarn cluster

2017-07-28 Thread yohann jardin

Not sure that we are OK on one thing: Yarn limitations are for the sum of all nodes, while you only specify the memory for a single node through Spark. By the way, the memory displayed in the UI is only a part of the total memory allocation: https://spark.apache.org/docs/latest/configuration.h

Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-28 Thread jeff saremi

We have a not too complex and not too large spark job that keeps dying with this error I have researched it and I have not seen any convincing explanation on why I am not using a shuffle service. Which server is the one that is refusing the connection? If I go to the server that is being report

Re: How to configure spark on Yarn cluster

2017-07-28 Thread jeff saremi

Thanks so much Yohann I checked the Storage/Memory column in Executors status page. Well below where I wanted to be. I will try the suggestion on smaller data sets. I am also well within the Yarn limitations (128GB). In my last try I asked for 48+32 (overhead). So somehow I am exceeding that or

Re: Spark Streaming with long batch / window duration

2017-07-28 Thread emceemouli

Thanks. If i not use Window and choose to use Streaming the data on to HDFS, could you suggest how to only store 1 week worth of data. Should i create a cron job to delete HDFS files older than a week. PLease let me know if you have any other suggestions -- View this message in context: http://

Spark Streaming as a Service

2017-07-28 Thread ajit roshen

We have few Spark Streaming Apps running on our AWS Spark 2.1 Yarn cluster. We currently log on to the Master Node of the cluster and start the App using "spark-submit", calling the jar. We would like to open up this to our users, so that they can submit their own Apps, but we would not be able to

2017-07-28 Thread ajit roshen

Re: Support Dynamic Partition Inserts params with SET command in Spark 2.0.1

2017-07-28 Thread Chetan Khatri

I think it will be same, but let me try that FYR - https://issues.apache.org/jira/browse/SPARK-19881 On Fri, Jul 28, 2017 at 4:44 PM, ayan guha wrote: > Try running spark.sql("set yourconf=val") > > On Fri, 28 Jul 2017 at 8:51 pm, Chetan Khatri > wrote: > >> Jorn, Both are same. >> >> On Fri,

Re: Support Dynamic Partition Inserts params with SET command in Spark 2.0.1

2017-07-28 Thread ayan guha

Try running spark.sql("set yourconf=val") On Fri, 28 Jul 2017 at 8:51 pm, Chetan Khatri wrote: > Jorn, Both are same. > > On Fri, Jul 28, 2017 at 4:18 PM, Jörn Franke wrote: > >> Try sparksession.conf().set >> >> On 28. Jul 2017, at 12:19, Chetan Khatri >> wrote: >> >> Hey Dev/ USer, >> >> I a

Re: Support Dynamic Partition Inserts params with SET command in Spark 2.0.1

2017-07-28 Thread Chetan Khatri

Jorn, Both are same. On Fri, Jul 28, 2017 at 4:18 PM, Jörn Franke wrote: > Try sparksession.conf().set > > On 28. Jul 2017, at 12:19, Chetan Khatri > wrote: > > Hey Dev/ USer, > > I am working with Spark 2.0.1 and with dynamic partitioning with Hive > facing below issue: > > org.apache.hadoop.h

Re: Support Dynamic Partition Inserts params with SET command in Spark 2.0.1

2017-07-28 Thread Jörn Franke

Try sparksession.conf().set > On 28. Jul 2017, at 12:19, Chetan Khatri wrote: > > Hey Dev/ USer, > > I am working with Spark 2.0.1 and with dynamic partitioning with Hive facing > below issue: > > org.apache.hadoop.hive.ql.metadata.HiveException: > Number of dynamic partitions created is 1344

Support Dynamic Partition Inserts params with SET command in Spark 2.0.1

2017-07-28 Thread Chetan Khatri

Hey Dev/ USer, I am working with Spark 2.0.1 and with dynamic partitioning with Hive facing below issue: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1344, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1

Re: SPARK Storagelevel issues

2017-07-28 Thread 周康

All right, i did not catch the point ,sorry for that. But you can take a snapshot of the heap, and then analysis heap dump by mat or other tools. >From the code i can not find any clue. 2017-07-28 17:09 GMT+08:00 Gourav Sengupta : > Hi, > > I have done all of that, but my question is "why should

Re: SPARK Storagelevel issues

2017-07-28 Thread Gourav Sengupta

Hi, I have done all of that, but my question is "why should a 62 MB data give memory error when we have over 2 GB of memory available". Therefore all that is mentioned by Zhoukang is not pertinent at all. Regards, Gourav Sengupta On Fri, Jul 28, 2017 at 4:43 AM, 周康 wrote: > testdf.persist(py

Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

can I do spark-submit --jars [s3://bucket/folder/jar_file]? or --jars

Persisting RDD: Low Percentage with a lot of memory available

RE: changing directories in Spark Streming

Re: How to configure spark on Yarn cluster

Re: How to configure spark on Yarn cluster

Re: How to configure spark on Yarn cluster

Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

Re: How to configure spark on Yarn cluster

Re: Spark Streaming with long batch / window duration

Spark Streaming as a Service

subscribe

Re: Support Dynamic Partition Inserts params with SET command in Spark 2.0.1

Re: Support Dynamic Partition Inserts params with SET command in Spark 2.0.1

Re: Support Dynamic Partition Inserts params with SET command in Spark 2.0.1

Re: Support Dynamic Partition Inserts params with SET command in Spark 2.0.1

Support Dynamic Partition Inserts params with SET command in Spark 2.0.1

Re: SPARK Storagelevel issues

Re: SPARK Storagelevel issues

25 matches

Site Navigation

Mail list logo

Footer information