Synonym handling replacement issue with UDF in Apache Spark

2017-04-28 Thread Nishanth
public String call(String t1, String t2, String t3) throws Exception {            return t1.replaceAll(t2, t3);        }    };    spark.udf().register("replaceWithTerm", replaceWithTerm, DataTypes.StringType);     Dataset joined = sentenceDataFrame.join(sentenceDataFrame2, callUDF("contains", sen

Synonym handling replacement issue with UDF in Apache Spark

2017-04-27 Thread Nishanth
public String call(String t1, String t2, String t3) throws Exception {            return t1.replaceAll(t2, t3);        }    };    spark.udf().register("replaceWithTerm", replaceWithTerm, DataTypes.StringType);     Dataset joined = sentenceDataFrame.join(sentenceDataFrame2, callUDF("contains", sen

Stopping criteria for gradient descent

2015-09-16 Thread Nishanth P S
? Code for convergence criteria: https://github.com/apache/spark/blob/c0e9ff1588b4d9313cc6ec6e00e5c7663eb67910/mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala#L251 Thanks, Nishanth

Re: How to use BigInteger for userId and productId in collaborative Filtering?

2015-01-14 Thread Nishanth P S
Yes, we are close to having more 2 billion users. In this case what is the best way to handle this. Thanks, Nishanth On Fri, Jan 9, 2015 at 9:50 PM, Xiangrui Meng men...@gmail.com wrote: Do you have more than 2 billion users/products? If not, you can pair each user/product id with an integer

Re: How to use BigInteger for userId and productId in collaborative Filtering?

2015-01-14 Thread Nishanth P S
Hi Xiangrui, Thanks! for the reply, I will explore the suggested solutions. -Nishanth Hi Nishanth, Just found out where you work:) We had some discussion in https://issues.apache.org/jira/browse/SPARK-2465 . Having long IDs will increase the communication cost, which may not worth the benefit