date:20170501

Re: Could any one please tell me why this takes forever to finish?

2017-05-01 Thread Yan Facai

Hi, 10.x.x.x is private network, see https://en.wikipedia.org/wiki/IP_address. You should use the public IP of your AWS. On Sat, Apr 29, 2017 at 6:35 AM, Yuan Fang wrote: > > object SparkPi { > private val logger = Logger(this.getClass) > > val sparkConf = new SparkConf() > .setAppName("

Re: Initialize Gaussian Mixture Model using Spark ML dataframe API

2017-05-01 Thread Yanbo Liang

Hi Tim, Spark ML API doesn't support set initial model for GMM currently. I wish we can get this feature in Spark 2.3. Thanks Yanbo On Fri, Apr 28, 2017 at 1:46 AM, Tim Smith wrote: > Hi, > > I am trying to figure out the API to initialize a gaussian mixture model > using either centroids crea

RE: Spark-SQL Query Optimization: overlapping ranges

2017-05-01 Thread Lavelle, Shawn

Jacek, Thanks for your help. I didn’t want to write a bug/enhancement unless warranted. ~ Shawn From: Jacek Laskowski [mailto:ja...@japila.pl] Sent: Thursday, April 27, 2017 8:39 AM To: Lavelle, Shawn Cc: user Subject: Re: Spark-SQL Query Optimization: overlapping ranges Hi Shawn, If yo

Loading postgresql table to spark SyntaxError

2017-05-01 Thread Saulo Ricci

Hi, the following code is reading a table from my postgresql database, and I'm following the directives I've read on the internet: val txs = spark.read.format("jdbc").options(Map( ("driver" -> "org.postgresql.Driver"), ("url" -> "jdbc:postgresql://host/dbname"), ("dbtable" -> "(se

Re: Schema Evolution for nested Dataset[T]

2017-05-01 Thread Michael Armbrust

Oh, and if you want a default other than null: import org.apache.spark.sql.functions._ df.withColumn("address", coalesce($"address", lit()) On Mon, May 1, 2017 at 10:29 AM, Michael Armbrust wrote: > The following should work: > > val schema = implicitly[org.apache.spark.sql.Encoder[Course]].sch

Re: Schema Evolution for nested Dataset[T]

2017-05-01 Thread Michael Armbrust

The following should work: val schema = implicitly[org.apache.spark.sql.Encoder[Course]].schema spark.read.schema(schema).parquet("data.parquet").as[Course] Note this will only work for nullable files (i.e. if you add a primitive like Int you need to make it an Option[Int]) On Sun, Apr 30, 2017

Re: Calculate mode separately for multiple columns in row

2017-05-01 Thread Everett Anderson

Two more ways: *Using the Typed Dataset API with Rows* Caveat: The docs about flatMapGroups do warn "This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to

Re: Reading table from sql database to apache spark dataframe/RDD

2017-05-01 Thread vincent gromakowski

Use cache or persist. The dataframe will be materialized when the 1st action is called and then be reused from memory for each following usage Le 1 mai 2017 4:51 PM, "Saulo Ricci" a écrit : > Hi, > > > I have the following code that is reading a table to a apache spark > DataFrame: > > val df =

Reading table from sql database to apache spark dataframe/RDD

2017-05-01 Thread Saulo Ricci

Hi, I have the following code that is reading a table to a apache spark DataFrame: val df = spark.read.format("jdbc") .option("url","jdbc:postgresql:host/database") .option("dbtable","tablename").option("user","username") .option("password", "password") .load() When I first

Re: removing columns from file

2017-05-01 Thread Steve Loughran

On 28 Apr 2017, at 16:10, Anubhav Agarwal mailto:anubha...@gmail.com>> wrote: Are you using Spark's textFiles method? If so, go through this blog :- http://tech.kinja.com/how-not-to-pull-from-s3-using-apache-spark-1704509219 old/dated blog post. If you get the Hadoop 2.8 binaries on your clas

Re: Could any one please tell me why this takes forever to finish?

Re: Initialize Gaussian Mixture Model using Spark ML dataframe API

RE: Spark-SQL Query Optimization: overlapping ranges

Loading postgresql table to spark SyntaxError

Re: Schema Evolution for nested Dataset[T]

Re: Schema Evolution for nested Dataset[T]

Re: Calculate mode separately for multiple columns in row

Re: Reading table from sql database to apache spark dataframe/RDD

Reading table from sql database to apache spark dataframe/RDD

Re: removing columns from file

10 matches

Site Navigation

Mail list logo

Footer information