I am getting a class not found error
    import org.apache.spark.SparkContext

It sounds like this is because pyspark is not installed, but as far as I
can tell it is.
Pyspark is installed in the correct python verison


root@namenode:/home/spark/# pip3.10 install pyspark
Requirement already satisfied: pyspark in
/usr/local/lib/python3.10/dist-packages (3.4.1)
Requirement already satisfied: py4j==0.10.9.7 in
/usr/local/lib/python3.10/dist-packages (from pyspark) (0.10.9.7)


      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.4.1
      /_/

Using Python version 3.10.12 (main, Jun 11 2023 05:26:28)
Spark context Web UI available at http://namenode:4040
Spark context available as 'sc' (master = yarn, app id =
application_1692452853354_0008).
SparkSession available as 'spark'.
Traceback (most recent call last):
  File "/home/spark/real-estate/pullhttp/pull_apartments.py", line 11, in
<module>
    import org.apache.spark.SparkContext
ModuleNotFoundError: No module named 'org.apache.spark.SparkContext'
2023-08-20T19:45:19,242 INFO  [Thread-5] spark.SparkContext: SparkContext
is stopping with exitCode 0.
2023-08-20T19:45:19,246 INFO  [Thread-5] server.AbstractConnector: Stopped
Spark@467be156{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
2023-08-20T19:45:19,247 INFO  [Thread-5] ui.SparkUI: Stopped Spark web UI
at http://namenode:4040
2023-08-20T19:45:19,251 INFO  [YARN application state monitor]
cluster.YarnClientSchedulerBackend: Interrupting monitor thread
2023-08-20T19:45:19,260 INFO  [Thread-5]
cluster.YarnClientSchedulerBackend: Shutting down all executors
2023-08-20T19:45:19,260 INFO  [dispatcher-CoarseGrainedScheduler]
cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to
shut down
2023-08-20T19:45:19,263 INFO  [Thread-5]
cluster.YarnClientSchedulerBackend: YARN client scheduler backend Stopped
2023-08-20T19:45:19,267 INFO  [dispatcher-event-loop-29]
spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint
stopped!
2023-08-20T19:45:19,271 INFO  [Thread-5] memory.MemoryStore: MemoryStore
cleared
2023-08-20T19:45:19,271 INFO  [Thread-5] storage.BlockManager: BlockManager
stopped
2023-08-20T19:45:19,275 INFO  [Thread-5] storage.BlockManagerMaster:
BlockManagerMaster stopped
2023-08-20T19:45:19,276 INFO  [dispatcher-event-loop-8]
scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
2023-08-20T19:45:19,279 INFO  [Thread-5] spark.SparkContext: Successfully
stopped SparkContext
2023-08-20T19:45:19,687 INFO  [shutdown-hook-0] util.ShutdownHookManager:
Shutdown hook called
2023-08-20T19:45:19,688 INFO  [shutdown-hook-0] util.ShutdownHookManager:
Deleting directory
/tmp/spark-9375452d-1989-4df5-9d85-950f751ce034/pyspark-2fcfbc8e-fd40-41f5-bf8d-e4c460332895
2023-08-20T19:45:19,689 INFO  [shutdown-hook-0] util.ShutdownHookManager:
Deleting directory /tmp/spark-bf6cbc46-ad8b-429a-9d7a-7d98b7d7912e
2023-08-20T19:45:19,690 INFO  [shutdown-hook-0] util.ShutdownHookManager:
Deleting directory /tmp/spark-9375452d-1989-4df5-9d85-950f751ce034
2023-08-20T19:45:19,691 INFO  [shutdown-hook-0] util.ShutdownHookManager:
Deleting directory /tmp/localPyFiles-6c113b2b-9ac3-45e3-9032-d1c83419aa64

Reply via email to