Github user aremirata commented on the pull request:
https://github.com/apache/spark/pull/5076#issuecomment-169383377
Hi guys,
First of all, I would like to thank you guys for developing spark and
putting it open source that we can use. I'm Alger Remirata, a researcher from
the Philippines. I'm new to Spark and Scala, and working in a project involving
matrix factorizations in Spark. I have a problem regarding running ALS in
Spark. It has a stackoverflow due to long linage chain as per comments on the
internet. One of their suggestion is to use the setCheckpointInterval so that
for every 10-20 iterations, we can checkpoint the RDDs and it prevents the
error. Just want to ask details on how to do checkpointing with ALS. I am using
spark-kernel developed by IBM: https://github.com/ibm-et/spark-kernel instead
of spark-shell.
Here are some of my specific questions regarding details on checkpoint:
1. In setting checkpoint directory through SparkContext.setCheckPointDir(),
it needs to be a hadoop compatible directory. Can we use any available
hdfs-compatible directory?
2. What do you mean by this comment on the code in ALS checkpointing:
If the checkpoint directory is not set in [[org.apache.spark.SparkContext]],
* this setting is ignored.
3. Is the use of setCheckPointInterval the only code I needed to add to
have checkpointing for ALS work?
4. I am getting this error: Name: java.lang.IllegalArgumentException,
Message: Wrong FS: expected file :///. How can I solve this? What is the proper
way of using checkpointing.
Thanks a lot!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]