Thanks Sean for the quick response.
Logged a Jira: https://issues.apache.org/jira/browse/SPARK-32662
Will send a pull request shortly.
Regards,
Jatin
On Wed, Aug 19, 2020 at 6:58 PM Sean Owen wrote:
> I think that's true. You're welcome to open a pull request / JIRA to
> remove that
Awesome, thanks for explaining it.
ср, 19 авг. 2020 г. в 16:29, Russell Spitzer :
> It determines whether it can use the checkpoint at runtime, so you'll be
> able to see it in the UI but not in the plan since you are looking at the
> plan
> before the job is actually running when it checks to
It determines whether it can use the checkpoint at runtime, so you'll be
able to see it in the UI but not in the plan since you are looking at the
plan
before the job is actually running when it checks to see if it can use the
checkpoint in the lineage.
Here is a two stage job for example:
i did it and see lineage change
BEFORE calling action. No success.
Job$ - isCheckpointed? false, getCheckpointFile: None
Job$ - recordsRDD.toDebugString:
(2) MapPartitionsRDD[7] at map at Job.scala:112 []
| MapPartitionsRDD[6] at map at Job.scala:111 []
| MapPartitionsRDD[5] at map at
Hi Prashant,
I have the problem only on K8S, it's working fine when spark is executed on
top of yarn.
I'm asking myself if the delegation gets saved, any idea how to check that?
Could it be because kms is in HA and spark request 2 delegation token?
For the testing, just running spark3 on top of
I think that's true. You're welcome to open a pull request / JIRA to
remove that requirement.
On Wed, Aug 19, 2020 at 3:21 AM Jatin Puri wrote:
>
> Hello,
>
> This is wrt
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala#L244
>
Hi Ivan,
Unlike cache/persist, checkpoint does not operate in-place but requires the
result to be assigned to a new variable. In your case:
val recordsRDD = convertToRecords(anotherRDD).checkpoint()
Best,
Jacob
Op wo 19 aug. 2020 om 14:39 schreef Ivan Petrov :
> Hi!
> Seems like I do smth
Hi!
Seems like I do smth wrong. I call .checkpoint() on RDD, but it's not
checkpointed...
What do I do wrong?
val recordsRDD = convertToRecords(anotherRDD)
recordsRDD.checkpoint()
logger.info("checkpoint done")
logger.info(s"isCheckpointed? ${recordsRDD.isCheckpointed},
getCheckpointFile:
-dev
Hi,
I have used Spark with HDFS encrypted with Hadoop KMS, and it worked well.
Somehow, I could not recall, if I had the kubernetes in the mix. Somehow,
seeing the error, it is not clear what caused the failure. Can I reproduce
this somehow?
Thanks,
On Sat, Aug 15, 2020 at 7:18 PM Michel
Hello,
This is wrt
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala#L244
require(vocab.length > 0, "The vocabulary size should be > 0. Lower minDF
as necessary.")
Currently, if `CountVectorizer` is trained on an empty dataset
Hi Joyan,
check this link: https://github.com/jackkolokasis/SparkInternals
Thanks
Iacovos
On 19/8/20 9:09 π.μ., joyan sil wrote:
Hi Jack and Spark experts,
Further to the question asked in this thread, what are some
recommended resources (blog/videos) that have helped you to deep dive
into
Hi Jack and Spark experts,
Further to the question asked in this thread, what are some recommended
resources (blog/videos) that have helped you to deep dive into the spark
source code.
Thanks
Regards
Joyan
On Wed, Aug 19, 2020 at 11:06 AM Jack Kolokasis
wrote:
> Hi,
>
> From my experience, I
12 matches
Mail list logo