Re: Spark Implementation of XGBoost

2015-10-27 Thread DB Tsai
Hi Meihua, For categorical features, the ordinal issue can be solved by trying all kind of different partitions 2^(q-1) -1 for q values into two groups. However, it's computational expensive. In Hastie's book, in 9.2.4, the trees can be trained by sorting the residuals and being learnt as if they

Re: [VOTE] Release Apache Spark 1.5.2 (RC1)

2015-10-27 Thread Reynold Xin
Yup looks like I missed that. I will build a new one. On Tuesday, October 27, 2015, Sean Owen wrote: > Ah, good point. I also see it still reads 1.5.1. I imagine we just need > another sweep to update all the version strings. > > On Tue, Oct 27, 2015 at 3:08 AM, Krishna

Re: [VOTE] Release Apache Spark 1.5.2 (RC1)

2015-10-27 Thread Sean Owen
Ah, good point. I also see it still reads 1.5.1. I imagine we just need another sweep to update all the version strings. On Tue, Oct 27, 2015 at 3:08 AM, Krishna Sankar wrote: > Guys, >The sc.version returns 1.5.1 in python and scala. Is anyone getting the > same

Exception when using some aggregate operators

2015-10-27 Thread Shagun Sodhani
Hi! I was trying out some aggregate functions in SparkSql and I noticed that certain aggregate operators are not working. This includes: approxCountDistinct countDistinct mean sumDistinct For example using countDistinct results in an error saying *Exception in thread "main"

Re: Spark Implementation of XGBoost

2015-10-27 Thread Meihua Wu
Hi DB Tsai, Thank you again for your insightful comments! 1) I agree the sorting method you suggested is a very efficient way to handle the unordered categorical variables in binary classification and regression. I propose we have a Spark ML Transformer to do the sorting and encoding, bringing

Re: using JavaRDD in spark-redis connector

2015-10-27 Thread Rohith P
got it ..thank u... -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/using-JavaRDD-in-spark-redis-connector-tp14391p14812.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Exception when using some aggregate operators

2015-10-27 Thread Shagun Sodhani
Yup avg works good. So we have alternate functions to use in place on the functions pointed out earlier. But my point is that are those original aggregate functions not supposed to be used or I am using them in the wrong way or is it a bug as I asked in my first mail. On Wed, Oct 28, 2015 at 3:20

Task not serializable exception

2015-10-27 Thread Rohith Parameshwara
I am getting this spark not serializable exception when running spark submit in standalone mode. I am trying to use spark streaming which gets its stream from kafka queues.. but it is not able to process the mapping actions on the RDDs from the stream ..the code where the serialization

Filter applied on merged Parquet shemsa with new column fails.

2015-10-27 Thread Hyukjin Kwon
When enabling mergedSchema and predicate filter, this fails since Parquet filters are pushed down regardless of each schema of the splits (or rather files). Dominic Ricard reported this issue ( https://issues.apache.org/jira/browse/SPARK-11103) Even though this would work okay by setting

Re: Exception when using some aggregate operators

2015-10-27 Thread Shagun Sodhani
Oops seems I made a mistake. The error message is : Exception in thread "main" org.apache.spark.sql.AnalysisException: undefined function countDistinct On 27 Oct 2015 15:49, "Shagun Sodhani" wrote: > Hi! I was trying out some aggregate functions in SparkSql and I

Re: Exception when using some aggregate operators

2015-10-27 Thread Reynold Xin
Try count(distinct columnane) In SQL distinct is not part of the function name. On Tuesday, October 27, 2015, Shagun Sodhani wrote: > Oops seems I made a mistake. The error message is : Exception in thread > "main" org.apache.spark.sql.AnalysisException: undefined