[jira] [Commented] (SOLR-9186) Logistic regression modeling for text
[ https://issues.apache.org/jira/browse/SOLR-9186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320544#comment-15320544 ] Joel Bernstein commented on SOLR-9186: -- I'm focusing on SOLR-9193 first. So feel free to submit a patch. Let me know if you'd like to discuss the design before you start. > Logistic regression modeling for text > - > > Key: SOLR-9186 > URL: https://issues.apache.org/jira/browse/SOLR-9186 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > SOLR-8492 optimizes a logistic regression model for numeric fields. While > this is interesting, I think it would be more interesting to build logistic > regression models on text within an inverted index. > This ticket will use the same *parallel iterative framework* as SOLR-8492, > but different data access patterns on the shards, to optimize a logistic > regression model on text. > This will support use cases such as building models for spam detection, > sentiment analysis and threat detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9186) Logistic regression modeling for text
[ https://issues.apache.org/jira/browse/SOLR-9186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319855#comment-15319855 ] Cao Manh Dat commented on SOLR-9186: That would be fine assumption. Are you working on this ticket? I would like to submit a patch in couple days. > Logistic regression modeling for text > - > > Key: SOLR-9186 > URL: https://issues.apache.org/jira/browse/SOLR-9186 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > SOLR-8492 optimizes a logistic regression model for numeric fields. While > this is interesting, I think it would be more interesting to build logistic > regression models on text within an inverted index. > This ticket will use the same *parallel iterative framework* as SOLR-8492, > but different data access patterns on the shards, to optimize a logistic > regression model on text. > This will support use cases such as building models for spam detection, > sentiment analysis and threat detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9186) Logistic regression modeling for text
[ https://issues.apache.org/jira/browse/SOLR-9186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319758#comment-15319758 ] Joel Bernstein commented on SOLR-9186: -- I think to start we would use one field. I was thinking tf. But tf-idf is an interesting thought, I suspect it would be better. I'm using tf-idf for SOLR-9193, which I think will be really nice. > Logistic regression modeling for text > - > > Key: SOLR-9186 > URL: https://issues.apache.org/jira/browse/SOLR-9186 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > SOLR-8492 optimizes a logistic regression model for numeric fields. While > this is interesting, I think it would be more interesting to build logistic > regression models on text within an inverted index. > This ticket will use the same *parallel iterative framework* as SOLR-8492, > but different data access patterns on the shards, to optimize a logistic > regression model on text. > This will support use cases such as building models for spam detection, > sentiment analysis and threat detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9186) Logistic regression modeling for text
[ https://issues.apache.org/jira/browse/SOLR-9186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319613#comment-15319613 ] Cao Manh Dat commented on SOLR-9186: It would be interesting idea. I just have some questions: - Do we classify based on one or many fields? - To represent doc -> vector, should we use tf-idf or just tf? So the field must have termvector stored? > Logistic regression modeling for text > - > > Key: SOLR-9186 > URL: https://issues.apache.org/jira/browse/SOLR-9186 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > SOLR-8492 optimizes a logistic regression model for numeric fields. While > this is interesting, I think it would be more interesting to build logistic > regression models on text within an inverted index. > This ticket will use the same *parallel iterative framework* as SOLR-8492, > but different data access patterns on the shards, to optimize a logistic > regression model on text. > This will support use cases such as building models for spam detection, > sentiment analysis and threat detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org