This should work check your path. It should pyspark from
which pyspark /opt/spark/bin/pyspark And your installation should contain cd $SPARK_HOME /opt/spark> ls LICENSE NOTICE R README.md RELEASE bin conf data examples jars kubernetes licenses logs python sbin yarn You should use from pyspark import SparkConf, SparkContext And this is your problem Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 3.4.1 /_/ Using Python version 3.9.16 (main, Apr 22 2023 14:16:13) Spark context Web UI available at http://rhes76:4040 Spark context available as 'sc' (master = local[*], app id = local-1692606989942). SparkSession available as 'spark'. >>> import org.apache.spark.SparkContext Traceback (most recent call last): File "<stdin>", line 1, in <module> *ModuleNotFoundError: No module named 'org'* HTH Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Mon, 21 Aug 2023 at 07:12, Kal Stevens <kalgstev...@gmail.com> wrote: > Are there installation instructions for Spark 3.4.1? > > I defined SPARK_HOME as it describes here > > https://spark.apache.org/docs/latest/api/python/getting_started/install.html > > ls $SPARK_HOME/python/lib > py4j-0.10.9.7-src.zip PY4J_LICENSE.txt pyspark.zip > > > I am getting a class not found error > import org.apache.spark.SparkContext > > I also unzipped those files just in case but that gives the same error. > > > It sounds like this is because pyspark is not installed, but as far as I > can tell it is. > Pyspark is installed in the correct python verison > > > root@namenode:/home/spark/# pip3.10 install pyspark > Requirement already satisfied: pyspark in > /usr/local/lib/python3.10/dist-packages (3.4.1) > Requirement already satisfied: py4j==0.10.9.7 in > /usr/local/lib/python3.10/dist-packages (from pyspark) (0.10.9.7) > > > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /__ / .__/\_,_/_/ /_/\_\ version 3.4.1 > /_/ > > Using Python version 3.10.12 (main, Jun 11 2023 05:26:28) > Spark context Web UI available at http://namenode:4040 > Spark context available as 'sc' (master = yarn, app id = > application_1692452853354_0008). > SparkSession available as 'spark'. > Traceback (most recent call last): > File "/home/spark/real-estate/pullhttp/pull_apartments.py", line 11, in > <module> > import org.apache.spark.SparkContext > ModuleNotFoundError: No module named 'org.apache.spark.SparkContext' > 2023-08-20T19:45:19,242 INFO [Thread-5] spark.SparkContext: SparkContext > is stopping with exitCode 0. > 2023-08-20T19:45:19,246 INFO [Thread-5] server.AbstractConnector: Stopped > Spark@467be156{HTTP/1.1, (http/1.1)}{0.0.0.0:4040} > 2023-08-20T19:45:19,247 INFO [Thread-5] ui.SparkUI: Stopped Spark web UI > at http://namenode:4040 > 2023-08-20T19:45:19,251 INFO [YARN application state monitor] > cluster.YarnClientSchedulerBackend: Interrupting monitor thread > 2023-08-20T19:45:19,260 INFO [Thread-5] > cluster.YarnClientSchedulerBackend: Shutting down all executors > 2023-08-20T19:45:19,260 INFO [dispatcher-CoarseGrainedScheduler] > cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to > shut down > 2023-08-20T19:45:19,263 INFO [Thread-5] > cluster.YarnClientSchedulerBackend: YARN client scheduler backend Stopped > 2023-08-20T19:45:19,267 INFO [dispatcher-event-loop-29] > spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint > stopped! > 2023-08-20T19:45:19,271 INFO [Thread-5] memory.MemoryStore: MemoryStore > cleared > 2023-08-20T19:45:19,271 INFO [Thread-5] storage.BlockManager: > BlockManager stopped > 2023-08-20T19:45:19,275 INFO [Thread-5] storage.BlockManagerMaster: > BlockManagerMaster stopped > 2023-08-20T19:45:19,276 INFO [dispatcher-event-loop-8] > scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 2023-08-20T19:45:19,279 INFO [Thread-5] spark.SparkContext: Successfully > stopped SparkContext > 2023-08-20T19:45:19,687 INFO [shutdown-hook-0] util.ShutdownHookManager: > Shutdown hook called > 2023-08-20T19:45:19,688 INFO [shutdown-hook-0] util.ShutdownHookManager: > Deleting directory > /tmp/spark-9375452d-1989-4df5-9d85-950f751ce034/pyspark-2fcfbc8e-fd40-41f5-bf8d-e4c460332895 > 2023-08-20T19:45:19,689 INFO [shutdown-hook-0] util.ShutdownHookManager: > Deleting directory /tmp/spark-bf6cbc46-ad8b-429a-9d7a-7d98b7d7912e > 2023-08-20T19:45:19,690 INFO [shutdown-hook-0] util.ShutdownHookManager: > Deleting directory /tmp/spark-9375452d-1989-4df5-9d85-950f751ce034 > 2023-08-20T19:45:19,691 INFO [shutdown-hook-0] util.ShutdownHookManager: > Deleting directory /tmp/localPyFiles-6c113b2b-9ac3-45e3-9032-d1c83419aa64 > >