Hi Samir,
either use *dataframe.na.fill()* method or the *nvl()* UDF when
selecting features:
val train = sqlContext.sql("SELECT ... nvl(Field, 1.0) AS Field ...
FROM test")
--
Bedrytski Aliaksandr
sp...@bedryt.ski
On Wed, Aug 10, 2016, at 11:19, Yanbo Liang wrote:
> Hi S
f 6 nodes, 16 cores/node, 64 ram/node => Gives: 17 executors,
> 19Gb/exec, 5 cores/exec
> No more than 5 cores per exec
> Leave some cores/Ram for the driver
More on the matter here
http://www.slideshare.net/cloudera/top-5-mistakes-to-avoid-when-writing-apache-spark-applications
temporary table, we add an unique, incremented,
thread safe id (AtomicInteger) to its name so that there are only
specific, non-shared temporary tables used for a test.
--
Bedrytski Aliaksandr
sp...@bedryt.ski
> On Sat, Aug 20, 2016, at 01:25, Everett Anderson wrote:
> Hi!
>
> Just
f I'm wrong), if you already have >1 specs
per test, the CPU will be already saturated, so total parallel execution
of tests will not give additional gains.
Regards
--
Bedrytski Aliaksandr
sp...@bedryt.ski
On Sun, Aug 21, 2016, at 18:30, Everett Anderson wrote:
>
>
> On Sun, A
E,'__-MM-_dd_') >=
> unix_timestamp(demand_timefence_end_date ,'__-MM-_dd_')
> """)
This is if demand_timefence_end_date has '__-MM-_dd_' date format
Regards,
--
Bedrytski Aliaksandr
sp...@bedryt.ski
On Wed, Aug 24,
dataframe.
This way it won't hit performance too much.
Regards
--
Bedrytski Aliaksandr
sp...@bedryt.ski
On Wed, Aug 24, 2016, at 16:42, Richard Siebeling wrote:
> Hi,
>
> what is the best way to calculate intermediate column statistics like
> the number of empty values
Hi Mich,
I was wondering what are the advantages of using helper methods instead
of one SQL multiline string?
(I rarely (if ever) use helper methods, but maybe I'm missing something)
Regards
--
Bedrytski Aliaksandr
sp...@bedryt.ski
On Thu, Aug 25, 2016, at 11:39, Mich Talebzadeh
.
Or (if the file is expected to be larger than bash tools can handle) you
could iterate over the resulting WrappedArray and create a case class
for each line.
PS: I wonder where the *meta* object from the json goes.
--
Bedrytski Aliaksandr
sp...@bedryt.ski
On Mon, Aug 29, 2016, at 11:27
don't really matter.
Regards,
--
Bedrytski Aliaksandr
sp...@bedryt.ski
On Wed, Aug 31, 2016, at 11:45, xiefeng wrote:
> I install a spark standalone and run the spark cluster(one master and one
> worker) in a windows 2008 server with 16cores and 24GB memory.
>
> I have d
Hi xiefeng,
Even if your RDDs are tiny and reduced to one partition, there is always
orchestration overhead (sending tasks to executor(s), reducing results,
etc., these things are not free).
If you need fast, [near] real-time processing, look towards
spark-streaming.
Regards,
--
Bedrytski
s
ambiguity problems.
Regards
--
Bedrytski Aliaksandr
sp...@bedryt.ski
On Fri, Sep 9, 2016, at 19:33, xingye wrote:
> Not sure whether this is the right distribution list that I can ask
> questions. If not, can someone give a distribution list that can find
> someone to help?
>
> I
Hi Saurabh,
you may use BuildInfo[1] sbt plugin to access values defined in
build.sbt
Regards,
--
Bedrytski Aliaksandr
sp...@bedryt.ski
On Mon, Sep 19, 2016, at 18:28, Saurabh Malviya (samalviy) wrote:
> Hi,
>
> Is there any way equivalent to profiles in maven in sbt. I want spar
l the executors in one output.
Regards,
--
Bedrytski Aliaksandr
sp...@bedryt.ski
On Thu, Sep 22, 2016, at 06:06, Divya Gehlot wrote:
> Hi,
> I have initialised the logging in my spark App
> */*Initialize Logging */
**val **log *= Logger.*getLogger*(getClass.getName)
>
> Logger
how to read it as a table (by transforming it to a
DataFrame)
Regards
--
Bedrytski Aliaksandr
sp...@bedryt.ski
On Sun, Sep 25, 2016, at 23:41, Koert Kuipers wrote:
> after having gotten used to have case classes represent complex
> structures in Datasets, i am surprised to find out tha
'Nan'
> """)
This query filters rows containing Nan for a table with 3 columns.
Regards,
--
Bedrytski Aliaksandr
sp...@bedryt.ski
On Mon, Sep 26, 2016, at 09:30, muhammet pakyürek wrote:
>
> is there any way to do this directly. if its not, is there any todo
> this indirectly using another datastrcutures of spark
>
Hi Muhammet,
python also supports sql queries
http://spark.apache.org/docs/latest/sql-programming-guide.html#running-sql-queries-programmatically
Regards,
--
Bedrytski Aliaksandr
sp...@bedryt.ski
On Mon, Sep 26, 2016, at 10:01, muhammet pakyürek wrote:
>
>
>
> but my requst i
lines")
spark.sql("SELECT cast(value as FLOAT) from lines").show()
+-+
|value|
+-+
| null|
| 1. |
| null|
| 8.6 |
+-+
After it you may filter the DataFrame for values containing null.
Regards,
--
Bedrytski Aliaksandr
sp...@bedryt.ski
On Wed, Sep 28, 2016, at 10
y lose the optimisations given by lining up the 3 steps
in one operation).
If there is a second action executed on any of the transformation,
persisting the farthest common transformation would be a good idea.
Regards,
--
Bedrytski Aliaksandr
sp...@bedryt.ski
On Thu, Sep 29, 2016, at 07:09, Shus
18 matches
Mail list logo