asking this on a tangent:
Is there anyway for the shuffle data to be replicated to more than one server?
thanks
From: jeff saremi
Sent: Friday, July 28, 2017 4:38:08 PM
To: Juan Rodríguez Hortalá
Cc: user@spark.apache.org
Subject: Re: Job keeps aborting because
Thanks Juan for taking the time
Here's more info:
- This is running on Yarn in Master mode
- See config params below
- This is a corporate environment. In general nodes should not be added or
removed that often to the cluster. Even if that is the case I would expect that
to be one or 2 servers
Hi Jeff,
Can you provide more information about how are you running your job? In
particular:
- which cluster manager are you using? It is YARN, Mesos, Spark
Standalone?
- with configuration options are you using to submit the job? In
particular are you using dynamic allocation or external shuf
Also, in your example doesn't the tempview need to be accessed using the
same sparkSession on the scala side? Since I am not using a notebook, how
can I get access to the same sparksession in scala.
On Fri, Jul 28, 2017 at 3:17 PM, Priyank Shrivastava wrote:
> Thanks Burak.
>
> In a streaming c
Thanks Burak.
In a streaming context would I need to do any state management for the temp
views? for example across sliding windows.
Priyank
On Fri, Jul 28, 2017 at 3:13 PM, Burak Yavuz wrote:
> Hi Priyank,
>
> You may register them as temporary tables to use across language
> boundaries.
>
>
Hi Priyank,
You may register them as temporary tables to use across language boundaries.
Python:
df = spark.readStream...
# Python logic
df.createOrReplaceTempView("tmp1")
Scala:
val df = spark.table("tmp1")
df.writeStream
.foreach(...)
On Fri, Jul 28, 2017 at 3:06 PM, Priyank Shrivastava w
TD,
For a hybrid python-scala approach, what's the recommended way of handing
off a dataframe from python to scala. I would like to know especially in a
streaming context.
I am not using notebooks/databricks. We are running it on our own spark
2.1 cluster.
Priyank
On Wed, Jul 26, 2017 at 12:4
Can we add extra library (jars on S3) to spark-submit? if yes, how? such as
--jars, extraClassPath, extraLibPathThanks,Richard
Hi,
This problem is very annoying for me and I'm tired of surfing the network
without any good advice to follow.
I have a complex job. It has been worked fine until I needed to save
partial results (RDDs) to files.
So I tried to cache the RDDs and then call a saveAsText method and follow
the workf
Hi,
I am saving the output of my streaming process to s3.
I want to able to change the directory of the stream as an hour passes by.
Will this work:
parsed_kf_frame.saveAsTextFiles((s3_location).format(
datetime.datetime.today().strftime("%Y%m%d"),
datetime.datetime.today().strftime("%H
For yarn, I'm speaking about the file fairscheduler.xml (if you kept the
default scheduling of Yarn):
https://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Allocation_file_format
Yohann Jardin
Le 7/28/2017 à 8:00 PM, jeff saremi a écrit :
The only relevant sett
The only relevant setting i see in Yarn is this:
yarn.nodemanager.resource.memory-mb
120726
which is 120GB and we are well below that. I don't see a total limit.
I haven't played with spark.memory.fraction. I'm not sure if it makes a
difference. Note that there are no errors coming
Not sure that we are OK on one thing: Yarn limitations are for the sum of all
nodes, while you only specify the memory for a single node through Spark.
By the way, the memory displayed in the UI is only a part of the total memory
allocation:
https://spark.apache.org/docs/latest/configuration.h
We have a not too complex and not too large spark job that keeps dying with
this error
I have researched it and I have not seen any convincing explanation on why
I am not using a shuffle service. Which server is the one that is refusing the
connection?
If I go to the server that is being report
Thanks so much Yohann
I checked the Storage/Memory column in Executors status page. Well below where
I wanted to be.
I will try the suggestion on smaller data sets.
I am also well within the Yarn limitations (128GB). In my last try I asked for
48+32 (overhead). So somehow I am exceeding that or
Thanks. If i not use Window and choose to use Streaming the data on to HDFS,
could you suggest how to only store 1 week worth of data. Should i create a
cron job to delete HDFS files older than a week. PLease let me know if you
have any other suggestions
--
View this message in context:
http://
We have few Spark Streaming Apps running on our AWS Spark 2.1 Yarn cluster.
We currently log on to the Master Node of the cluster and start the App
using "spark-submit", calling the jar.
We would like to open up this to our users, so that they can submit their
own Apps, but we would not be able to
I think it will be same, but let me try that
FYR - https://issues.apache.org/jira/browse/SPARK-19881
On Fri, Jul 28, 2017 at 4:44 PM, ayan guha wrote:
> Try running spark.sql("set yourconf=val")
>
> On Fri, 28 Jul 2017 at 8:51 pm, Chetan Khatri
> wrote:
>
>> Jorn, Both are same.
>>
>> On Fri,
Try running spark.sql("set yourconf=val")
On Fri, 28 Jul 2017 at 8:51 pm, Chetan Khatri
wrote:
> Jorn, Both are same.
>
> On Fri, Jul 28, 2017 at 4:18 PM, Jörn Franke wrote:
>
>> Try sparksession.conf().set
>>
>> On 28. Jul 2017, at 12:19, Chetan Khatri
>> wrote:
>>
>> Hey Dev/ USer,
>>
>> I a
Jorn, Both are same.
On Fri, Jul 28, 2017 at 4:18 PM, Jörn Franke wrote:
> Try sparksession.conf().set
>
> On 28. Jul 2017, at 12:19, Chetan Khatri
> wrote:
>
> Hey Dev/ USer,
>
> I am working with Spark 2.0.1 and with dynamic partitioning with Hive
> facing below issue:
>
> org.apache.hadoop.h
Try sparksession.conf().set
> On 28. Jul 2017, at 12:19, Chetan Khatri wrote:
>
> Hey Dev/ USer,
>
> I am working with Spark 2.0.1 and with dynamic partitioning with Hive facing
> below issue:
>
> org.apache.hadoop.hive.ql.metadata.HiveException:
> Number of dynamic partitions created is 1344
Hey Dev/ USer,
I am working with Spark 2.0.1 and with dynamic partitioning with Hive
facing below issue:
org.apache.hadoop.hive.ql.metadata.HiveException:
Number of dynamic partitions created is 1344, which is more than 1000.
To solve this try to set hive.exec.max.dynamic.partitions to at least 1
All right, i did not catch the point ,sorry for that.
But you can take a snapshot of the heap, and then analysis heap dump by mat
or other tools.
>From the code i can not find any clue.
2017-07-28 17:09 GMT+08:00 Gourav Sengupta :
> Hi,
>
> I have done all of that, but my question is "why should
Hi,
I have done all of that, but my question is "why should a 62 MB data give
memory error when we have over 2 GB of memory available".
Therefore all that is mentioned by Zhoukang is not pertinent at all.
Regards,
Gourav Sengupta
On Fri, Jul 28, 2017 at 4:43 AM, 周康 wrote:
> testdf.persist(py
25 matches
Mail list logo