Re: Issue while creating spark app

2022-02-27 Thread Bjørn Jørgensen
responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction.

Re: Issue while creating spark app

2022-02-27 Thread Bjørn Jørgensen
be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Sun, 27 Feb 2022 at 20:12, Bjørn Jørgensen > wrote: > >> Mitch: You are using scala 2.11 to do this. Have a look at Building Spark >> <https://spark.apache.org

Re: Difference between windowing functions and aggregation functions on big data

2022-02-27 Thread Bjørn Jørgensen
as Department, e.name as Employee,e.salary as >>>>> Salary,dense_rank() over(partition by d.name order by e.salary desc) >>>>> as rnk from Department d join Employee e on e.departmentId=d.id ) a >>>>> where rnk<=3 >>>>> >>>>> Time taken: 1212 ms >>>>> >>>>> But as per my understanding, the aggregation should have run faster. >>>>> So, my whole point is if the dataset is huge I should force some kind of >>>>> map reduce jobs like we have an option called >>>>> df.groupby().reduceByGroups() >>>>> >>>>> So I think the aggregation query is taking more time since the dataset >>>>> size here is smaller and as we all know that map reduce works faster when >>>>> there is a huge volume of data. Haven't tested it yet on big data but >>>>> needed some expert guidance over here. >>>>> >>>>> Please correct me if I am wrong. >>>>> >>>>> TIA, >>>>> Sid >>>>> >>>>> >>>>> >>>>> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: Issue while creating spark app

2022-02-27 Thread Bjørn Jørgensen
n.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Sat, 26 Feb 2022 at 22:48, Sean Owen wrote: >> >>> I don't think any of that is related, no. >>> How are you dependencies set up? manually with IJ, or in a build file >>> (Maven, Gradle)? Normally you do the latter and dependencies are taken care >>> of for you, but you app would definitely have to express a dependency on >>> Scala libs. >>> >>> On Sat, Feb 26, 2022 at 4:25 PM Bitfox wrote: >>> >>>> Java SDK installed? >>>> >>>> On Sun, Feb 27, 2022 at 5:39 AM Sachit Murarka >>>> wrote: >>>> >>>>> Hello , >>>>> >>>>> Thanks for replying. I have installed Scala plugin in IntelliJ first >>>>> then also it's giving same error >>>>> >>>>> Cannot find project Scala library 2.12.12 for module SparkSimpleApp >>>>> >>>>> Thanks >>>>> Rajat >>>>> >>>>> On Sun, Feb 27, 2022, 00:52 Bitfox wrote: >>>>> >>>>>> You need to install scala first, the current version for spark is >>>>>> 2.12.15 >>>>>> I would suggest you install scala by sdk which works great. >>>>>> >>>>>> Thanks >>>>>> >>>>>> On Sun, Feb 27, 2022 at 12:10 AM rajat kumar < >>>>>> kumar.rajat20...@gmail.com> wrote: >>>>>> >>>>>>> Hello Users, >>>>>>> >>>>>>> I am trying to create spark application using Scala(Intellij). >>>>>>> I have installed Scala plugin in intelliJ still getting below error:- >>>>>>> >>>>>>> Cannot find project Scala library 2.12.12 for module SparkSimpleApp >>>>>>> >>>>>>> >>>>>>> Could anyone please help what I am doing wrong? >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Rajat >>>>>>> >>>>>> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: One click to run Spark on Kubernetes

2022-02-23 Thread Bjørn Jørgensen
ail's technical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> >>>> On Wed, 23 Feb 2022 at 04:06, bo yang wrote: >>>> >>>>> Hi Spark Community, >>>>> >>>>> We built an open source tool to deploy and run Spark on Kubernetes >>>>> with a one click command. For example, on AWS, it could automatically >>>>> create an EKS cluster, node group, NGINX ingress, and Spark Operator. Then >>>>> you will be able to use curl or a CLI tool to submit Spark application. >>>>> After the deployment, you could also install Uber Remote Shuffle Service >>>>> to >>>>> enable Dynamic Allocation on Kuberentes. >>>>> >>>>> Anyone interested in using or working together on such a tool? >>>>> >>>>> Thanks, >>>>> Bo >>>>> >>>>> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: Loading .xlsx and .xlx files using pyspark

2022-02-23 Thread Bjørn Jørgensen
7;t be able to achieve spark functionality while loading the file in > distributed manner. > > Thanks, > Sid > > On Wed, Feb 23, 2022 at 7:38 PM Bjørn Jørgensen > wrote: > >> from pyspark import pandas as ps >> >> >> ps.read_excel? >> "Support b

Re: Loading .xlsx and .xlx files using pyspark

2022-02-23 Thread Bjørn Jørgensen
.option("inferSchema", "true") \ >>> .load("/home/.../Documents/test_excel.xlsx") >>> >>> It is giving me the below error message: >>> >>> java.lang.NoClassDefFoundError: org/apache/logging/log4j/LogManager >>> >>> I tried several Jars for this error but no luck. Also, what would be the >>> efficient way to load it? >>> >>> Thanks, >>> Sid >>> >> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: Choice of IDE for Spark

2021-10-06 Thread Bjørn Jørgensen
I use jupyterlab on k8s with minio as s3 storage. https://github.com/bjornjorgensen/jlpyk8s With this code to start it all :) from pyspark import pandas as ps import re import numpy as np import pandas as pd from pyspark.sql import SparkSession from pyspark.sql.functions import concat, concat

Re: Problems with update function in koalas - pyspark pandas.

2021-09-12 Thread Bjørn Jørgensen
https://issues.apache.org/jira/browse/SPARK-36722 https://github.com/apache/spark/pull/33968 On 2021/09/11 10:06:50, Bj��rn J��rgensen wrote: > Hi I am using "from pyspark import pandas as ps" in a master build yesterday. > I do have some columns that I need to join to one. > In pandas I u

Problems with update function in koalas - pyspark pandas.

2021-09-11 Thread Bjørn Jørgensen
Hi I am using "from pyspark import pandas as ps" in a master build yesterday. I do have some columns that I need to join to one. In pandas I use update. 54 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION

Re: Can’t write to PVC in K8S

2021-09-02 Thread Bjørn Jørgensen
:50, Holden Karau wrote: > > > You can change the UID of one of them to match, or you could add them both > > to a group and set permissions to 770. > > > > On Tue, Aug 31, 2021 at 12:18 PM Bjørn Jørgensen > > wrote: > > > >> Hi and thanks for

Re: Can’t write to PVC in K8S

2021-08-31 Thread Bjørn Jørgensen
t; >> > >> However, once your parquet file is written to the work-dir, how are you > >> going to utilise it? > >> > >> HTH > >> > >> > >> > >> > >>view my Linkedin profile > >> <https://www.linkedin

Re: Can’t write to PVC in K8S

2021-08-30 Thread Bjørn Jørgensen
05b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be li

Can’t write to PVC in K8S

2021-08-30 Thread Bjørn Jørgensen
Hi, I have built and running spark on k8s. A link to my repo https://github.com/bjornjorgensen/jlpyk8s Everything seems to be running fine, but I can’t save to PVC. If I convert the dataframe to pandas, then I can save it. from pyspark.sql import SparkSession spark = SparkSession.builder \

<    1   2