[jira] [Commented] (SOLR-1844) CommonGramsQueryFilterFactory should read words in a comma-delimited format
[ https://issues.apache.org/jira/browse/SOLR-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044971#comment-13044971 ] Steven Rowe commented on SOLR-1844: --- Hi David, The link in the description is dead - this one mentioned the new400common.txt file: http://www.hathitrust.org/node/181 but I'm not sure it's what you were after. Looks like this is the sample you're talking about: http://www.hathitrust.org/blogs/large-scale-search/common-word-list-commongrams - I can see the comma deliminted values there. Would you care to make a patch? CommonGramsQueryFilterFactory should read words in a comma-delimited format --- Key: SOLR-1844 URL: https://issues.apache.org/jira/browse/SOLR-1844 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 1.4 Reporter: David Smiley Priority: Minor CommonGramsQueryFilterFactory expects that the file(s) given to the words argument is a carriage-return delimited list of words. It doesn't support comments either. This file format should be more flexible to support comma delimited values. I came across this because I was trying to use the sample file provided by HathiTrust: http://www.hathitrust.org/node/180(named in a file new400common.txt) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1844) CommonGramsQueryFilterFactory should read words in a comma-delimited format
[ https://issues.apache.org/jira/browse/SOLR-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044988#comment-13044988 ] David Smiley commented on SOLR-1844: On second thought, I think the current behavior is fine because it's consistent with the other filters that need lists of words since they all share the same code to do it -- BaseTokenStreamFactory.getWordSet(...). If any change should happen, it should happen there. I'm fine with this issue being closed as Won't-Fix. It was easy enough for me to simply replace the commas in Hathi's file with a carriage return. CommonGramsQueryFilterFactory should read words in a comma-delimited format --- Key: SOLR-1844 URL: https://issues.apache.org/jira/browse/SOLR-1844 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 1.4 Reporter: David Smiley Priority: Minor CommonGramsQueryFilterFactory expects that the file(s) given to the words argument is a carriage-return delimited list of words. It doesn't support comments either. This file format should be more flexible to support comma delimited values. I came across this because I was trying to use the sample file provided by HathiTrust: http://www.hathitrust.org/node/180(named in a file new400common.txt) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1844) CommonGramsQueryFilterFactory should read words in a comma-delimited format
[ https://issues.apache.org/jira/browse/SOLR-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849571#action_12849571 ] David Smiley commented on SOLR-1844: It _does_ support comments; sorry. CommonGramsQueryFilterFactory should read words in a comma-delimited format --- Key: SOLR-1844 URL: https://issues.apache.org/jira/browse/SOLR-1844 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 1.4 Reporter: David Smiley Priority: Minor CommonGramsQueryFilterFactory expects that the file(s) given to the words argument is a carriage-return delimited list of words. It doesn't support comments either. This file format should be more flexible to support comma delimited values. I came across this because I was trying to use the sample file provided by HathiTrust: http://www.hathitrust.org/node/180(named in a file new400common.txt) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.