Re: Term pollution from binary data

2007-11-06 Thread robert engels
I think the binary section recognizer is probably your best best. If you write an analyzer that ignores terms that consist of only hexadecimal digits, and contain embedded digits, you will probably reduce the pollution quite a bit, and it is trivial to write, and not too expensive to check.

Term pollution from binary data

2007-11-06 Thread Chuck Williams
Hi All, We are experiencing OOM's when binary data contained in text files (e.g., a base64 section of a text file) is indexed. We have extensive recognition of file types but have encountered binary sections inside of otherwise normal text files. We are using the default value of 128 for te

[jira] Closed: (LUCENE-1019) CustomScoreQuery should support multiple ValueSourceQueries

2007-11-06 Thread Kyle Maxwell (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Maxwell closed LUCENE-1019. Resolution: Invalid Lucene Fields: (was: [Patch Available, New]) Ok, I'm satisfied with D

[jira] Updated: (LUCENE-1044) Behavior on hard power shutdown

2007-11-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1044: --- Attachment: LUCENE-1044.take3.patch Attached another rev of the patch. I changed th

[jira] Commented: (LUCENE-1016) TermVectorAccessor, transparent vector space access

2007-11-06 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540518 ] Karl Wettin commented on LUCENE-1016: - I think this is interesting: http://www.nabble.com/How-to-generate-TermF

Re: [jira] Commented: (LUCENE-935) Improve maven artifacts

2007-11-06 Thread Karl Wettin
1 nov 2007 kl. 17.18 skrev Grant Ingersoll (JIRA): http://people.apache.org/maven-snapshot-repository/org/apache/lucene/ love++ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]