[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17092 ping @Yunni --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17092 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17092 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2121/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...
Github user kturgut commented on the issue: https://github.com/apache/spark/pull/17092 @jkbradley @MLnick @sethah @Yunni @merlintang @akatz It seems LSH will be a perfect fit for matching patient records, if only I can figure out how to assign different weights to each column of the patient record that I am comparing. For instance, each record may have 0 to many identifiers. if the identifiers match exactly, we consider a solid match. However if ID's do not strongly match, we also look at additional set of fields such as name, birthdate, address at different weights. For instance, if the names exactly match, it is stronger than if they match with small typos. To give different weights for each field we are comparing, should I have to write custom distance calculator? Or perhaps, should I do a MinHashing and then LSH as a second step as described in this document: http://web.stanford.edu/class/cs345a/slides/05-LSH.pdf? It does not look like the AND-OR amplification would help with that, as it takes the number of hash-functions as input, and it does not seem like we have control over the sensitivity of the hash-functions. I will really appreciate your guidance. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/17092 @MLnick @jkbradley @sethah Could you take a review? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/17092 Ping. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/17092 @jkbradley @sethah Please take a review when you have time. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...
Github user merlintang commented on the issue: https://github.com/apache/spark/pull/17092 @Yunni I test this patch locally, it can work, but I have one idea to improve it. We can discuss it in other ticket. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...
Github user merlintang commented on the issue: https://github.com/apache/spark/pull/17092 @Yunni ok, let us discuss the further optimization step in other ticket. the current patch is LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/17092 @jkbradley @MLnick Here is a clean PR. Sorry for messing up the previous one! @merlintang I am happy to continue our discussion here: https://issues.apache.org/jira/browse/SPARK-19771 as OR-AND amplification requires much more changes than SPARK-18450 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17092 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73550/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17092 **[Test build #73550 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73550/testReport)** for PR 17092 at commit [`9dd87ba`](https://github.com/apache/spark/commit/9dd87ba21a025939df7020ff1491a2c6c29f2d93). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17092 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17092 **[Test build #73550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73550/testReport)** for PR 17092 at commit [`9dd87ba`](https://github.com/apache/spark/commit/9dd87ba21a025939df7020ff1491a2c6c29f2d93). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org