public String call(String t1, String t2, String t3)
throws Exception { return t1.replaceAll(t2, t3); } };
spark.udf().register("replaceWithTerm", replaceWithTerm, DataTypes.StringType);
Dataset joined = sentenceDataFrame.join(sentenceDataFrame2,
callUDF("contains", sen
public String call(String t1, String t2, String t3)
throws Exception { return t1.replaceAll(t2, t3); } };
spark.udf().register("replaceWithTerm", replaceWithTerm, DataTypes.StringType);
Dataset joined = sentenceDataFrame.join(sentenceDataFrame2,
callUDF("contains", sen
?
Code for convergence criteria:
https://github.com/apache/spark/blob/c0e9ff1588b4d9313cc6ec6e00e5c7663eb67910/mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala#L251
Thanks,
Nishanth
Yes, we are close to having more 2 billion users. In this case what is the
best way to handle this.
Thanks,
Nishanth
On Fri, Jan 9, 2015 at 9:50 PM, Xiangrui Meng men...@gmail.com wrote:
Do you have more than 2 billion users/products? If not, you can pair
each user/product id with an integer
Hi Xiangrui,
Thanks! for the reply, I will explore the suggested solutions.
-Nishanth
Hi Nishanth,
Just found out where you work:) We had some discussion in
https://issues.apache.org/jira/browse/SPARK-2465 . Having long IDs
will increase the communication cost, which may not worth the benefit