from:"Jan Štěrba"

Re: bug? using withColumn with colName with dot can't replace column

2016-03-15 Thread Jan Štěrba

First off, I would advise against having dots in column names, thats just playing with fire. Second the exception is really strange since spark is complaining about a completely unrelated column. I would like to see the df schema before the exception was thrown. -- Jan Sterba

Re: adding rows to a DataFrame

2016-03-11 Thread Jan Štěrba

It very much depends on the logic that generates the new rows. Is it per row (i.e. without context?) then you can just convert to RDD and perform a map operation on each row. JavaPairRDD grouped = dataFrame.javaRDD().groupBy( group by what you need, probably ID ); return

Re: Spark on YARN memory consumption

2016-03-11 Thread Jan Štěrba

YARN container memory overhead. Also, > typically the memory increments for YARN containers is 1GB. > > > > This gives a good overview: > http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ > > > > Thanks, > > Silvio > > >

Spark on YARN memory consumption

2016-03-11 Thread Jan Štěrba

Hello, I am exprimenting with tuning an on demand spark-cluster on top of our cloudera hadoop. I am running Cloudera 5.5.2 with Spark 1.5 right now and I am running spark in yarn-client mode. Right now my main experimentation is about spark.executor.memory property and I have noticed a strange

Re: updating the Books section on the Spark documentation page

2016-03-08 Thread Jan Štěrba

You could try creating a pull-request on github. -Jan -- Jan Sterba https://twitter.com/honzasterba | http://flickr.com/honzasterba | http://500px.com/honzasterba On Wed, Mar 9, 2016 at 2:45 AM, Mohammed Guller wrote: > Hi - > > > > The Spark documentation page

Re: Saving multiple outputs in the same job

2016-03-08 Thread Jan Štěrba

Hi Andy, its nice to see that we are not the only ones with the same issues. So far we have not gone as far as you have. What we have done is that we cache whatever dataframes/rdds are shared foc computing different output. This has brought us quite the speedup, but we still see that saving some

Re: 1.6.0 spark.sql datetime conversion problem

2016-03-05 Thread Jan Štěrba

I dont know whats wrong but I can suggest looking up the source of the UDF and debugging from there. I would think this is some JDK API cleveat and not a Spark bug -- Jan Sterba https://twitter.com/honzasterba | http://flickr.com/honzasterba | http://500px.com/honzasterba On Fri, Mar 4, 2016 at

Re: [Help]: DataframeNAfunction fill method throwing exception

2016-02-25 Thread Jan Štěrba

just use coalesce function df.selectExpr("name", "coalesce(age, 0) as age") -- Jan Sterba https://twitter.com/honzasterba | http://flickr.com/honzasterba | http://500px.com/honzasterba On Fri, Feb 26, 2016 at 5:27 AM, Divya Gehlot wrote: > Hi, > I have dataset which

Re: Running executors missing in sparkUI

2016-02-25 Thread Jan Štěrba

anks > > On Thu, Feb 25, 2016 at 4:28 AM, Jan Štěrba <i...@jansterba.com> wrote: >> >> Hello, >> >> I have quite a weird behaviour that I can't quite wrap my head around. >> I am running Spark on a Hadoop YARN cluster. I have Spark configured >> in such

Running executors missing in sparkUI

2016-02-25 Thread Jan Štěrba

Hello, I have quite a weird behaviour that I can't quite wrap my head around. I am running Spark on a Hadoop YARN cluster. I have Spark configured in such a way that it utilizes all free vcores in the cluster (setting max vcores per executor and number of executors to use all vcores in cluster).

Re: bug? using withColumn with colName with dot can't replace column

Re: adding rows to a DataFrame

Re: Spark on YARN memory consumption

Spark on YARN memory consumption

Re: updating the Books section on the Spark documentation page

Re: Saving multiple outputs in the same job

Re: 1.6.0 spark.sql datetime conversion problem

Re: [Help]: DataframeNAfunction fill method throwing exception

Re: Running executors missing in sparkUI

Running executors missing in sparkUI

10 matches

Site Navigation

Mail list logo

Footer information