date:20190606

Task - Id : Staus Failed

2019-06-06 Thread dimitris plakas

Hello Everyone, I am trying to set up a yarn cluster with three nodes (one master and two workers). I followed this tutorial : https://linode.com/docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/ I also try to execute the yarn exmaple at the end of this tutorial with the wordcount.

Spark on Kubernetes Authentication error

2019-06-06 Thread Nick Dawes

Hi there, I'm trying to run Spark on EKS. Created an EKS cluster, added nodes and then trying to submit a Spark job from an EC2 instance. Ran following commands for access. kubectl create serviceaccount spark kubectl create clusterrolebinding spark-role --clusterrole=admin --serviceaccount=defaul

Re: adding a column to a groupBy (dataframe)

2019-06-06 Thread Bruno Nassivet

Hi Marcelo, Maybe the spark.sql.functions.explode give what you need? // Bruno > Le 6 juin 2019 à 16:02, Marcelo Valle a écrit : > > Generating the city id (child) is easy, monotonically increasing id worked > for me. > > The problem is the country (parent) which has to be in both countrie

Fwd: [Spark SQL Thrift Server] Persistence errors with PostgreSQL and MySQL in 2.4.3

2019-06-06 Thread Ricardo Martinelli de Oliveira

Hello, I'm running Thrift server with PostgresSQL persistence for hive metastore. I'm using Postgres 9.6 and spark 2.4.3 in this environment. When I start Thrift server I get lots of errors while creating the schema and it happen everytime I reach postgres, like: 19/06/06 15:51:59 WARN Datastore

Re: adding a column to a groupBy (dataframe)

2019-06-06 Thread Marcelo Valle

Hi Magnus, Thanks for replying. I didn't get the partition solution, tbh, but indeed, I was trying to figure a way of solving only with data frames without rejoining. I can't have a global list of countries in my real scenario, as the real scenario is not reference data, countries was just an exa

Multi-dimensional aggregations in Structured Streaming

2019-06-06 Thread Symeon Meichanetzoglou

Hi all, We are facing a challenge where a simple use case seems not trivial to implement in structured streaming: an aggregation should be calculated and then some other aggregations should further aggregate on the first aggregation. Something like: 1st aggregation: val df = dfIn.groupBy(a,b,c,d).

Re: adding a column to a groupBy (dataframe)

2019-06-06 Thread Marcelo Valle

Generating the city id (child) is easy, monotonically increasing id worked for me. The problem is the country (parent) which has to be in both countries and cities data frames. On Thu, 6 Jun 2019 at 14:57, Magnus Nilsson wrote: > Well, you could do a repartition on cityname/nrOfCities and use

Re: adding a column to a groupBy (dataframe)

2019-06-06 Thread Magnus Nilsson

Well, you could do a repartition on cityname/nrOfCities and use the spark_partition_id function or the mappartitionswithindex dataframe method to add a city Id column. Then just split the dataframe into two subsets. Be careful of hashcollisions on the reparition Key though, or more than one city mi

Re: Spark on K8S - --packages not working for cluster mode?

2019-06-06 Thread pacuna

Great! Thanks a lot. Best, Pablo. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: adding a column to a groupBy (dataframe)

2019-06-06 Thread Marcelo Valle

Akshay, First of all, thanks for the answer. I *am* using monotonically increasing id, but that's not my problem. My problem is I want to output 2 tables from 1 data frame, 1 parent table with ID for the group by and 1 child table with the parent id without the group by. I was able to solve this

[no subject]

2019-06-06 Thread Shi Tyshanchn

Re: Spark on K8S - --packages not working for cluster mode?

2019-06-06 Thread Stavros Kontopoulos

Hi, This has been fixed here: https://github.com/apache/spark/pull/23546. Will be available with Spark 3.0.0 Best, Stavros On Wed, Jun 5, 2019 at 11:18 PM pacuna wrote: > I'm trying to run a sample code that reads a file from s3 so I need the aws > sdk and aws hadoop dependencies. > If I assem

Re: adding a column to a groupBy (dataframe)

2019-06-06 Thread Akshay Bhardwaj

Additionally there is "uuid" function available as well if that helps your use case. Akshay Bhardwaj +91-97111-33849 On Thu, Jun 6, 2019 at 3:18 PM Akshay Bhardwaj < akshay.bhardwaj1...@gmail.com> wrote: > Hi Marcelo, > > If you are using spark 2.3+ and dataset API/SparkSQL,you can use this >

Re: adding a column to a groupBy (dataframe)

2019-06-06 Thread Akshay Bhardwaj

Hi Marcelo, If you are using spark 2.3+ and dataset API/SparkSQL,you can use this inbuilt function "monotonically_increasing_id" in Spark. A little tweaking using Spark sql inbuilt functions can enable you to achieve this without having to write code or define RDDs with map/reduce functions. Aksh

Task - Id : Staus Failed

Spark on Kubernetes Authentication error

Re: adding a column to a groupBy (dataframe)

Fwd: [Spark SQL Thrift Server] Persistence errors with PostgreSQL and MySQL in 2.4.3

Re: adding a column to a groupBy (dataframe)

Multi-dimensional aggregations in Structured Streaming

Re: adding a column to a groupBy (dataframe)

Re: adding a column to a groupBy (dataframe)

Re: Spark on K8S - --packages not working for cluster mode?

Re: adding a column to a groupBy (dataframe)

[no subject]

Re: Spark on K8S - --packages not working for cluster mode?

Re: adding a column to a groupBy (dataframe)

Re: adding a column to a groupBy (dataframe)

14 matches

Site Navigation

Mail list logo

Footer information