Kevin Van Lieshout created SPARK-32534:
------------------------------------------

             Summary: Cannot load a Pipeline Model on a stopped Spark Context
                 Key: SPARK-32534
                 URL: https://issues.apache.org/jira/browse/SPARK-32534
             Project: Spark
          Issue Type: Bug
          Components: Deploy, Kubernetes
    Affects Versions: 2.4.6
            Reporter: Kevin Van Lieshout


I am running Spark in a Kubernetes cluster than is running Spark NLP using the 
Pyspark ML Pipeline Model class to load the model and then transform on the 
spark dataframe. We run this within a docker container that starts up a spark 
context, mounts volumes, spins up executors, etc and then does it 
transformations, udfs, etc and then closes down the spark context. The first 
time I load the model when my service has just been started, everything is 
fine. If I run my application for a second time without resetting my service, 
even though the context is entirely stopped from the previous run and a new one 
is started up, the Pipeline Model has some attribute in one of its base classes 
that thinks the context its running on is closed, so then I get a : cannot call 
a function on a stopped spark context when I try and load the model in my 
service again. I have to shut down my service each time if I want consecutive 
runs through my spark pipeline, which is not ideal, so I was wondering if this 
was a common issue amongst fellow pyspark users that use Pipeline Model, or is 
there a common work around to resetting all spark contexts or whether the 
pipeline model caches a spark context of some sort. Any help is very useful. 

 

 
cls.pipeline = PipelineModel.read().load(NLP_MODEL)
 
is how I load the model. And our spark context is very similar to a typical 
kubernetes/spark setup. Nothing special there



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to