I run pyspark streaming example queue_streaming.py. But run into the
following error, does anyone know what might be wrong ? Thanks
ERROR [2017-08-02 08:29:20,023] ({Stop-StreamingContext}
Logging.scala[logError]:91) - Cannot connect to Python process. It's
probably dead. Stopping StreamingContext
Thanks Mark. I'll examine the status more carefully to observe this.
From: Mark Hamstra
Sent: Tuesday, August 1, 2017 11:25:46 AM
To: user@spark.apache.org
Subject: Re: How can i remove the need for calling cache
Very likely, much of the potential duplication is
Thanks Vadim. I'll try that
From: Vadim Semenov
Sent: Tuesday, August 1, 2017 12:05:17 PM
To: jeff saremi
Cc: user@spark.apache.org
Subject: Re: How can i remove the need for calling cache
You can use `.checkpoint()`:
```
val sc: SparkContext
sc.setCheckpointDir(
You can use `.checkpoint()`:
```
val sc: SparkContext
sc.setCheckpointDir("hdfs:///tmp/checkpointDirectory")
myrdd.checkpoint()
val result1 = myrdd.map(op1(_))
result1.count() // Will save `myrdd` to HDFS and do map(op1…
val result2 = myrdd.map(op2(_))
result2.count() // Will load `myrdd` from HDFS
here are the threads that talk about problems we're experiencing. These
problems exacerbate when we use cache/persist
https://www.mail-archive.com/user@spark.apache.org/msg64987.html
https://www.mail-archive.com/user@spark.apache.org/msg64986.html
So I am looking for a way to reproduce the same
I am trying to use PySpark to read a Kafka stream and then write it to
Redis. However, PySpark does not have support for a ForEach sink. So, I am
thinking of reading the Kafka stream into a DataFrame in Python and then
sending that DataFrame into a Scala application to be written to Redis. Is
there
Very likely, much of the potential duplication is already being avoided
even without calling cache/persist. When running the above code without
`myrdd.cache`, have you looked at the Spark web UI for the Jobs? For at
least one of them you will likely see that many Stages are marked as
"skipped", whi
Hi Jeff, that looks sane to me. Do you have additional details?
On 1 August 2017 at 11:05, jeff saremi wrote:
> Calling cache/persist fails all our jobs (i have posted 2 threads on
> this).
>
> And we're giving up hope in finding a solution.
> So I'd like to find a workaround for that:
>
> If
Calling cache/persist fails all our jobs (i have posted 2 threads on this).
And we're giving up hope in finding a solution.
So I'd like to find a workaround for that:
If I save an RDD to hdfs and read it back, can I use it in more than one
operation?
Example: (using cache)
// do a whole bunch
?9?4
I have no idea about this
---Original---
From: "serkan ta?0?6"
Date: 2017/7/31 16:42:59
To: "pandees waran";"??"<1427357...@qq.com>;
Cc: "user@spark.apache.org";
Subject: Re: Spark parquet file read problem !
Thank you very much.
Schema merge fixed the structure problem b
I am tying to connect to AWS managed ES service using Spark ES Connector ,
but am not able to.
I am passing es.nodes and es.port along with es.nodes.wan.only set to true.
But it fails with below error:
34 ERROR NetworkClient: Node [x.x.x.x:443] failed (The server x.x.x.x
failed to respond); no ot
Hi,
I have a weird issue - spark streaming application fails once/twice a day
with java.io.OptionalDataException. It happens when deserializing task.
The problem appeared after migration from spark 1.6 to spark 2.2, cluster
runs in latest CDH distro, YARN mode.
I have no clue what to blame and w
Hi,
I have a weird issue - spark streaming application fails once/twice a day
with java.io.OptionalDataException. It happens when deserializing task.
The problem appeared after migration from spark 1.6 to spark 2.2, cluster
runs in latest CDH distro, YARN mode.
I have no clue what to blame and w
13 matches
Mail list logo