[GitHub] spark pull request: [SPARK-5955][MLLIB] add checkpointInterval to ...

aremirata Wed, 06 Jan 2016 08:43:11 -0800

Github user aremirata commented on the pull request:

    https://github.com/apache/spark/pull/5076#issuecomment-169383377
  
    Hi guys,
    
    First of all, I would like to thank you guys for developing spark and 
putting it open source that we can use. I'm Alger Remirata, a researcher from 
the Philippines. I'm new to Spark and Scala, and working in a project involving 
matrix factorizations in Spark. I have a problem regarding running ALS in 
Spark. It has a stackoverflow due to long linage chain as per comments on the 
internet. One of their suggestion is to use the setCheckpointInterval so that 
for every 10-20 iterations, we can checkpoint the RDDs and it prevents the 
error. Just want to ask details on how to do checkpointing with ALS. I am using 
spark-kernel developed by IBM: https://github.com/ibm-et/spark-kernel instead 
of spark-shell.
    
    Here are some of my specific questions regarding details on checkpoint:
    
    1. In setting checkpoint directory through SparkContext.setCheckPointDir(), 
it needs to be a hadoop compatible directory. Can we use any available 
hdfs-compatible directory?
    2. What do you mean by this comment on the code in ALS checkpointing:
    If the checkpoint directory is not set in [[org.apache.spark.SparkContext]],
      * this setting is ignored.
    3. Is the use of setCheckPointInterval the only code I needed to add to 
have checkpointing for ALS work?
    4. I am getting this error: Name: java.lang.IllegalArgumentException, 
Message: Wrong FS: expected file :///. How can I solve this? What is the proper 
way of using checkpointing.
    
    Thanks a lot!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5955][MLLIB] add checkpointInterval to ...

Reply via email to