[jira] [Updated] (SOLR-8590) example/files improvements
[ https://issues.apache.org/jira/browse/SOLR-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-8590: --- Description: There are several example/files improvements/fixes that are warranted: * Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle brackets in field names), also add display of these fields in /browse results rendering * Improve quality of extracted phrases * Extract, facet, and display acronyms * Add sorting controls, possibly all or some of these: last modified date, created date, relevancy, and title * Add grouping by doc_type perhaps * fix debug mode - currently does not update the parsed query debug output (this is probably a bug in data driven /browse as well) * Harden update-script: it currently errors if documents do not have a "content" field (eg indexing basic CSV), but should instead skip extraction of e-mail addresses and URLs when no "content". Not quite the use case (no "content") for example/files, but no reason to error in the update script at least. was: There are several example/files improvements/fixes that are warranted: * Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle brackets in field names), also add display of these fields in /browse results rendering * Improve quality of extracted phrases * Extract, facet, and display acronyms * Add sorting controls, possibly all or some of these: last modified date, created date, relevancy, and title * Add grouping by doc_type perhaps * fix debug mode - currently does not update the parsed query debug output (this is probably a bug in data driven /browse as well) > example/files improvements > -- > > Key: SOLR-8590 > URL: https://issues.apache.org/jira/browse/SOLR-8590 > Project: Solr > Issue Type: Bug > Components: examples >Reporter: Erik Hatcher >Assignee: Erik Hatcher >Priority: Minor > Fix For: 6.0 > > > There are several example/files improvements/fixes that are warranted: > * Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle > brackets in field names), also add display of these fields in /browse results > rendering > * Improve quality of extracted phrases > * Extract, facet, and display acronyms > * Add sorting controls, possibly all or some of these: last modified date, > created date, relevancy, and title > * Add grouping by doc_type perhaps > * fix debug mode - currently does not update the parsed query debug output > (this is probably a bug in data driven /browse as well) > * Harden update-script: it currently errors if documents do not have a > "content" field (eg indexing basic CSV), but should instead skip extraction > of e-mail addresses and URLs when no "content". Not quite the use case (no > "content") for example/files, but no reason to error in the update script at > least. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8590) example/files improvements
[ https://issues.apache.org/jira/browse/SOLR-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-8590: --- Description: There are several example/files improvements/fixes that are warranted: * Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle brackets in field names), also add display of these fields in /browse results rendering * Improve quality of extracted phrases * Extract, facet, and display acronyms * Add sorting controls, possibly all or some of these: last modified date, created date, relevancy, and title * Add grouping by doc_type perhaps * fix debug mode - currently does not update the parsed query debug output (this is probably a bug in data driven /browse as well) * Harden update-script: it currently errors if documents do not have a "content" field (eg indexing basic CSV), but should instead skip extraction of e-mail addresses and URLs when no "content". Not quite the use case (no "content") for example/files, but no reason to error in the update script at least. * Filter out bogus e-mail addresses. I'm seeing {{email_ss = "?@[^],\,/^@[$_a-z]"}} for some documents (using Solr docs/ directory as the dataset) was: There are several example/files improvements/fixes that are warranted: * Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle brackets in field names), also add display of these fields in /browse results rendering * Improve quality of extracted phrases * Extract, facet, and display acronyms * Add sorting controls, possibly all or some of these: last modified date, created date, relevancy, and title * Add grouping by doc_type perhaps * fix debug mode - currently does not update the parsed query debug output (this is probably a bug in data driven /browse as well) * Harden update-script: it currently errors if documents do not have a "content" field (eg indexing basic CSV), but should instead skip extraction of e-mail addresses and URLs when no "content". Not quite the use case (no "content") for example/files, but no reason to error in the update script at least. > example/files improvements > -- > > Key: SOLR-8590 > URL: https://issues.apache.org/jira/browse/SOLR-8590 > Project: Solr > Issue Type: Bug > Components: examples >Reporter: Erik Hatcher >Assignee: Erik Hatcher >Priority: Minor > Fix For: 6.0 > > > There are several example/files improvements/fixes that are warranted: > * Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle > brackets in field names), also add display of these fields in /browse results > rendering > * Improve quality of extracted phrases > * Extract, facet, and display acronyms > * Add sorting controls, possibly all or some of these: last modified date, > created date, relevancy, and title > * Add grouping by doc_type perhaps > * fix debug mode - currently does not update the parsed query debug output > (this is probably a bug in data driven /browse as well) > * Harden update-script: it currently errors if documents do not have a > "content" field (eg indexing basic CSV), but should instead skip extraction > of e-mail addresses and URLs when no "content". Not quite the use case (no > "content") for example/files, but no reason to error in the update script at > least. > * Filter out bogus e-mail addresses. I'm seeing {{email_ss = > "?@[^],\,/^@[$_a-z]"}} for some documents (using Solr docs/ directory as the > dataset) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8590) example/files improvements
[ https://issues.apache.org/jira/browse/SOLR-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-8590: --- Attachment: SOLR-8590.patch This patch fixes the email_ss and url_ss field names, hardens the update script so "content" isn't required, and sets a fallback language and increase the threshold on language detection. > example/files improvements > -- > > Key: SOLR-8590 > URL: https://issues.apache.org/jira/browse/SOLR-8590 > Project: Solr > Issue Type: Bug > Components: examples >Reporter: Erik Hatcher >Assignee: Erik Hatcher >Priority: Minor > Fix For: 6.0 > > Attachments: SOLR-8590.patch > > > There are several example/files improvements/fixes that are warranted: > * Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle > brackets in field names), also add display of these fields in /browse results > rendering > * Improve quality of extracted phrases > * Extract, facet, and display acronyms > * Add sorting controls, possibly all or some of these: last modified date, > created date, relevancy, and title > * Add grouping by doc_type perhaps > * fix debug mode - currently does not update the parsed query debug output > (this is probably a bug in data driven /browse as well) > * Harden update-script: it currently errors if documents do not have a > "content" field (eg indexing basic CSV), but should instead skip extraction > of e-mail addresses and URLs when no "content". Not quite the use case (no > "content") for example/files, but no reason to error in the update script at > least. > * Filter out bogus e-mail addresses. I'm seeing {{email_ss = > "?@[^],\,/^@[$_a-z]"}} for some documents (using Solr docs/ directory as the > dataset) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8590) example/files improvements
[ https://issues.apache.org/jira/browse/SOLR-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-8590: --- Description: There are several example/files improvements/fixes that are warranted: * Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle brackets in field names), also add display of these fields in /browse results rendering * Improve quality of extracted phrases * Extract, facet, and display acronyms * Add sorting controls, possibly all or some of these: last modified date, created date, relevancy, and title * Add grouping by doc_type perhaps * fix debug mode - currently does not update the parsed query debug output (this is probably a bug in data driven /browse as well) was: There are several example/files improvements/fixes that are warranted: * Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle brackets in field names), also add display of these fields in /browse results rendering * Improve quality of extracted phrases * Extract, facet, and display acronyms * Add sorting controls, possibly all or some of these: last modified date, created date, relevancy, and title * Add grouping by doc_type perhaps > example/files improvements > -- > > Key: SOLR-8590 > URL: https://issues.apache.org/jira/browse/SOLR-8590 > Project: Solr > Issue Type: Bug > Components: examples >Reporter: Erik Hatcher >Assignee: Erik Hatcher >Priority: Minor > Fix For: 6.0 > > > There are several example/files improvements/fixes that are warranted: > * Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle > brackets in field names), also add display of these fields in /browse results > rendering > * Improve quality of extracted phrases > * Extract, facet, and display acronyms > * Add sorting controls, possibly all or some of these: last modified date, > created date, relevancy, and title > * Add grouping by doc_type perhaps > * fix debug mode - currently does not update the parsed query debug output > (this is probably a bug in data driven /browse as well) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org