This command only defines a new DataFrame, in order to see some results you
need to do something like merged_spark_data.show() on a new line.

Regarding the error I think it's typical error that you get when you run
Spark on Windows OS. You can suppress it using Winutils tool (Google it or
ChatGPT it to see how).

On Thu, 12 Oct 2023, 11:58 Kelum Perera, <kelum0...@hotmail.com> wrote:

> Dear friends,
>
> I'm trying to get a fresh start with Spark. I tried to read few CSV files
> in a folder, but the task got stuck and not completed as shown in the
> copied content from the terminal.
>
> Can someone help to understand what is going wrong?
>
> Versions;
> java version "11.0.16" 2022-07-19 LTS
> Java(TM) SE Runtime Environment 18.9 (build 11.0.16+11-LTS-199)
> Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.16+11-LTS-199, mixed
> mode)
>
> Python 3.9.13
> Windows 10
>
> Copied from the terminal;
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /__ / .__/\_,_/_/ /_/\_\   version 3.5.0
>       /_/
>
> Using Python version 3.9.13 (main, Aug 25 2022 23:51:50)
> Spark context Web UI available at http://LK510FIDSLW4.ey.net:4041
> Spark context available as 'sc' (master = local[*], app id =
> local-1697089858181).
> SparkSession available as 'spark'.
> >>> merged_spark_data =
> spark.read.csv(r"C:\Users\Kelum.Perera\Downloads\data-master\nyse_all\nyse_data\*",
> header=False )
> Exception in thread "globPath-ForkJoinPool-1-worker-115"
> java.lang.UnsatisfiedLinkError:
> org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
>         at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native
> Method)
>         at
> org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:793)
>         at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:1249)
>         at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1454)
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:601)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2014)
>         at
> org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:761)
>         at org.apache.hadoop.fs.Globber.listStatus(Globber.java:128)
>         at org.apache.hadoop.fs.Globber.doGlob(Globber.java:291)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:202)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:2124)
>         at
> org.apache.spark.deploy.SparkHadoopUtil.globPath(SparkHadoopUtil.scala:238)
>         at
> org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:737)
>         at
> org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:380)
>         at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
>         at scala.util.Success.$anonfun$map$1(Try.scala:255)
>         at scala.util.Success.map(Try.scala:213)
>         at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
>         at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
>         at
> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
>         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
>         at
> java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
>         at
> java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
>         at
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
>         at
> java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
>         at
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
>         at
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
>
>
>
> Noting happens afterwards. Appreciate your kind input to solve this.
>
> Best Regards,
> Kelum Perera
>
>
>
>

Reply via email to