First off, I would advise against having dots in column names, thats
just playing with fire.
Second the exception is really strange since spark is complaining
about a completely unrelated column. I would like to see the df schema
before the exception was thrown.
--
Jan Sterba
It very much depends on the logic that generates the new rows. Is it
per row (i.e. without context?) then you can just convert to RDD and
perform a map operation on each row.
JavaPairRDD
Hello,
I am exprimenting with tuning an on demand spark-cluster on top of our
cloudera hadoop. I am running Cloudera 5.5.2 with Spark 1.5 right now
and I am running spark in yarn-client mode.
Right now my main experimentation is about spark.executor.memory
property and I have noticed a strange
You could try creating a pull-request on github.
-Jan
--
Jan Sterba
https://twitter.com/honzasterba | http://flickr.com/honzasterba |
http://500px.com/honzasterba
On Wed, Mar 9, 2016 at 2:45 AM, Mohammed Guller wrote:
> Hi -
>
>
>
> The Spark documentation page
Hi Andy,
its nice to see that we are not the only ones with the same issues. So
far we have not gone as far as you have. What we have done is that we
cache whatever dataframes/rdds are shared foc computing different
output. This has brought us quite the speedup, but we still see that
saving some
I dont know whats wrong but I can suggest looking up the source of the UDF
and debugging from there. I would think this is some JDK API cleveat and
not a Spark bug
--
Jan Sterba
https://twitter.com/honzasterba | http://flickr.com/honzasterba |
http://500px.com/honzasterba
On Fri, Mar 4, 2016 at
just use coalesce function
df.selectExpr("name", "coalesce(age, 0) as age")
--
Jan Sterba
https://twitter.com/honzasterba | http://flickr.com/honzasterba |
http://500px.com/honzasterba
On Fri, Feb 26, 2016 at 5:27 AM, Divya Gehlot
wrote:
> Hi,
> I have dataset which
anks
>
> On Thu, Feb 25, 2016 at 4:28 AM, Jan Štěrba <i...@jansterba.com> wrote:
>>
>> Hello,
>>
>> I have quite a weird behaviour that I can't quite wrap my head around.
>> I am running Spark on a Hadoop YARN cluster. I have Spark configured
>> in such
Hello,
I have quite a weird behaviour that I can't quite wrap my head around.
I am running Spark on a Hadoop YARN cluster. I have Spark configured
in such a way that it utilizes all free vcores in the cluster (setting
max vcores per executor and number of executors to use all vcores in
cluster).