responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sun, 27 Feb 2022 at 20:12, Bjørn Jørgensen
> wrote:
>
>> Mitch: You are using scala 2.11 to do this. Have a look at Building Spark
>> <https://spark.apache.org
as Department, e.name as Employee,e.salary as
>>>>> Salary,dense_rank() over(partition by d.name order by e.salary desc)
>>>>> as rnk from Department d join Employee e on e.departmentId=d.id ) a
>>>>> where rnk<=3
>>>>>
>>>>> Time taken: 1212 ms
>>>>>
>>>>> But as per my understanding, the aggregation should have run faster.
>>>>> So, my whole point is if the dataset is huge I should force some kind of
>>>>> map reduce jobs like we have an option called
>>>>> df.groupby().reduceByGroups()
>>>>>
>>>>> So I think the aggregation query is taking more time since the dataset
>>>>> size here is smaller and as we all know that map reduce works faster when
>>>>> there is a huge volume of data. Haven't tested it yet on big data but
>>>>> needed some expert guidance over here.
>>>>>
>>>>> Please correct me if I am wrong.
>>>>>
>>>>> TIA,
>>>>> Sid
>>>>>
>>>>>
>>>>>
>>>>>
--
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge
+47 480 94 297
n.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sat, 26 Feb 2022 at 22:48, Sean Owen wrote:
>>
>>> I don't think any of that is related, no.
>>> How are you dependencies set up? manually with IJ, or in a build file
>>> (Maven, Gradle)? Normally you do the latter and dependencies are taken care
>>> of for you, but you app would definitely have to express a dependency on
>>> Scala libs.
>>>
>>> On Sat, Feb 26, 2022 at 4:25 PM Bitfox wrote:
>>>
>>>> Java SDK installed?
>>>>
>>>> On Sun, Feb 27, 2022 at 5:39 AM Sachit Murarka
>>>> wrote:
>>>>
>>>>> Hello ,
>>>>>
>>>>> Thanks for replying. I have installed Scala plugin in IntelliJ first
>>>>> then also it's giving same error
>>>>>
>>>>> Cannot find project Scala library 2.12.12 for module SparkSimpleApp
>>>>>
>>>>> Thanks
>>>>> Rajat
>>>>>
>>>>> On Sun, Feb 27, 2022, 00:52 Bitfox wrote:
>>>>>
>>>>>> You need to install scala first, the current version for spark is
>>>>>> 2.12.15
>>>>>> I would suggest you install scala by sdk which works great.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Sun, Feb 27, 2022 at 12:10 AM rajat kumar <
>>>>>> kumar.rajat20...@gmail.com> wrote:
>>>>>>
>>>>>>> Hello Users,
>>>>>>>
>>>>>>> I am trying to create spark application using Scala(Intellij).
>>>>>>> I have installed Scala plugin in intelliJ still getting below error:-
>>>>>>>
>>>>>>> Cannot find project Scala library 2.12.12 for module SparkSimpleApp
>>>>>>>
>>>>>>>
>>>>>>> Could anyone please help what I am doing wrong?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Rajat
>>>>>>>
>>>>>>
--
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge
+47 480 94 297
ail's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, 23 Feb 2022 at 04:06, bo yang wrote:
>>>>
>>>>> Hi Spark Community,
>>>>>
>>>>> We built an open source tool to deploy and run Spark on Kubernetes
>>>>> with a one click command. For example, on AWS, it could automatically
>>>>> create an EKS cluster, node group, NGINX ingress, and Spark Operator. Then
>>>>> you will be able to use curl or a CLI tool to submit Spark application.
>>>>> After the deployment, you could also install Uber Remote Shuffle Service
>>>>> to
>>>>> enable Dynamic Allocation on Kuberentes.
>>>>>
>>>>> Anyone interested in using or working together on such a tool?
>>>>>
>>>>> Thanks,
>>>>> Bo
>>>>>
>>>>>
--
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge
+47 480 94 297
7;t be able to achieve spark functionality while loading the file in
> distributed manner.
>
> Thanks,
> Sid
>
> On Wed, Feb 23, 2022 at 7:38 PM Bjørn Jørgensen
> wrote:
>
>> from pyspark import pandas as ps
>>
>>
>> ps.read_excel?
>> "Support b
.option("inferSchema", "true") \
>>> .load("/home/.../Documents/test_excel.xlsx")
>>>
>>> It is giving me the below error message:
>>>
>>> java.lang.NoClassDefFoundError: org/apache/logging/log4j/LogManager
>>>
>>> I tried several Jars for this error but no luck. Also, what would be the
>>> efficient way to load it?
>>>
>>> Thanks,
>>> Sid
>>>
>>
--
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge
+47 480 94 297
I use jupyterlab on k8s with minio as s3 storage.
https://github.com/bjornjorgensen/jlpyk8s
With this code to start it all :)
from pyspark import pandas as ps
import re
import numpy as np
import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql.functions import concat, concat
https://issues.apache.org/jira/browse/SPARK-36722
https://github.com/apache/spark/pull/33968
On 2021/09/11 10:06:50, Bj��rn J��rgensen wrote:
> Hi I am using "from pyspark import pandas as ps" in a master build yesterday.
> I do have some columns that I need to join to one.
> In pandas I u
Hi I am using "from pyspark import pandas as ps" in a master build yesterday.
I do have some columns that I need to join to one.
In pandas I use update.
54 FD_OBJECT_SUPPLIES_SERVICES_OBJECT_SUPPLY_SERVICE_ADDITIONAL_INFORMATION
:50, Holden Karau wrote:
>
> > You can change the UID of one of them to match, or you could add them both
> > to a group and set permissions to 770.
> >
> > On Tue, Aug 31, 2021 at 12:18 PM Bjørn Jørgensen
> > wrote:
> >
> >> Hi and thanks for
t; >>
> >> However, once your parquet file is written to the work-dir, how are you
> >> going to utilise it?
> >>
> >> HTH
> >>
> >>
> >>
> >>
> >>view my Linkedin profile
> >> <https://www.linkedin
05b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be li
Hi, I have built and running spark on k8s. A link to my repo
https://github.com/bjornjorgensen/jlpyk8s
Everything seems to be running fine, but I can’t save to PVC.
If I convert the dataframe to pandas, then I can save it.
from pyspark.sql import SparkSession
spark = SparkSession.builder \
101 - 114 of 114 matches
Mail list logo