Hi Meihua,
For categorical features, the ordinal issue can be solved by trying
all kind of different partitions 2^(q-1) -1 for q values into two
groups. However, it's computational expensive. In Hastie's book, in
9.2.4, the trees can be trained by sorting the residuals and being
learnt as if they
Yup looks like I missed that. I will build a new one.
On Tuesday, October 27, 2015, Sean Owen wrote:
> Ah, good point. I also see it still reads 1.5.1. I imagine we just need
> another sweep to update all the version strings.
>
> On Tue, Oct 27, 2015 at 3:08 AM, Krishna
Ah, good point. I also see it still reads 1.5.1. I imagine we just need
another sweep to update all the version strings.
On Tue, Oct 27, 2015 at 3:08 AM, Krishna Sankar wrote:
> Guys,
>The sc.version returns 1.5.1 in python and scala. Is anyone getting the
> same
Hi! I was trying out some aggregate functions in SparkSql and I noticed
that certain aggregate operators are not working. This includes:
approxCountDistinct
countDistinct
mean
sumDistinct
For example using countDistinct results in an error saying
*Exception in thread "main"
Hi DB Tsai,
Thank you again for your insightful comments!
1) I agree the sorting method you suggested is a very efficient way to
handle the unordered categorical variables in binary classification
and regression. I propose we have a Spark ML Transformer to do the
sorting and encoding, bringing
got it ..thank u...
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/using-JavaRDD-in-spark-redis-connector-tp14391p14812.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Yup avg works good. So we have alternate functions to use in place on the
functions pointed out earlier. But my point is that are those original
aggregate functions not supposed to be used or I am using them in the wrong
way or is it a bug as I asked in my first mail.
On Wed, Oct 28, 2015 at 3:20
I am getting this spark not serializable exception when running spark submit in
standalone mode. I am trying to use spark streaming which gets its stream from
kafka queues.. but it is not able to process the mapping actions on the RDDs
from the stream ..the code where the serialization
When enabling mergedSchema and predicate filter, this fails since Parquet
filters are pushed down regardless of each schema of the splits (or rather
files).
Dominic Ricard reported this issue (
https://issues.apache.org/jira/browse/SPARK-11103)
Even though this would work okay by setting
Oops seems I made a mistake. The error message is : Exception in thread
"main" org.apache.spark.sql.AnalysisException: undefined function
countDistinct
On 27 Oct 2015 15:49, "Shagun Sodhani" wrote:
> Hi! I was trying out some aggregate functions in SparkSql and I
Try
count(distinct columnane)
In SQL distinct is not part of the function name.
On Tuesday, October 27, 2015, Shagun Sodhani
wrote:
> Oops seems I made a mistake. The error message is : Exception in thread
> "main" org.apache.spark.sql.AnalysisException: undefined
11 matches
Mail list logo