[jira] [Updated] (SOLR-8590) example/files improvements

2016-01-26 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-8590:
---
Description: 
There are several example/files improvements/fixes that are warranted:

* Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle 
brackets in field names), also add display of these fields in /browse results 
rendering
* Improve quality of extracted phrases
* Extract, facet, and display acronyms
* Add sorting controls, possibly all or some of these: last modified date, 
created date, relevancy, and title
* Add grouping by doc_type perhaps
* fix debug mode - currently does not update the parsed query debug output 
(this is probably a bug in data driven /browse as well)
* Harden update-script: it currently errors if documents do not have a 
"content" field (eg indexing basic CSV), but should instead skip extraction of 
e-mail addresses and URLs when no "content".  Not quite the use case (no 
"content") for example/files, but no reason to error in the update script at 
least.

  was:
There are several example/files improvements/fixes that are warranted:

* Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle 
brackets in field names), also add display of these fields in /browse results 
rendering
* Improve quality of extracted phrases
* Extract, facet, and display acronyms
* Add sorting controls, possibly all or some of these: last modified date, 
created date, relevancy, and title
* Add grouping by doc_type perhaps
* fix debug mode - currently does not update the parsed query debug output 
(this is probably a bug in data driven /browse as well)


> example/files improvements
> --
>
> Key: SOLR-8590
> URL: https://issues.apache.org/jira/browse/SOLR-8590
> Project: Solr
>  Issue Type: Bug
>  Components: examples
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
>Priority: Minor
> Fix For: 6.0
>
>
> There are several example/files improvements/fixes that are warranted:
> * Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle 
> brackets in field names), also add display of these fields in /browse results 
> rendering
> * Improve quality of extracted phrases
> * Extract, facet, and display acronyms
> * Add sorting controls, possibly all or some of these: last modified date, 
> created date, relevancy, and title
> * Add grouping by doc_type perhaps
> * fix debug mode - currently does not update the parsed query debug output 
> (this is probably a bug in data driven /browse as well)
> * Harden update-script: it currently errors if documents do not have a 
> "content" field (eg indexing basic CSV), but should instead skip extraction 
> of e-mail addresses and URLs when no "content".  Not quite the use case (no 
> "content") for example/files, but no reason to error in the update script at 
> least.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8590) example/files improvements

2016-01-26 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-8590:
---
Description: 
There are several example/files improvements/fixes that are warranted:

* Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle 
brackets in field names), also add display of these fields in /browse results 
rendering
* Improve quality of extracted phrases
* Extract, facet, and display acronyms
* Add sorting controls, possibly all or some of these: last modified date, 
created date, relevancy, and title
* Add grouping by doc_type perhaps
* fix debug mode - currently does not update the parsed query debug output 
(this is probably a bug in data driven /browse as well)
* Harden update-script: it currently errors if documents do not have a 
"content" field (eg indexing basic CSV), but should instead skip extraction of 
e-mail addresses and URLs when no "content".  Not quite the use case (no 
"content") for example/files, but no reason to error in the update script at 
least.
* Filter out bogus e-mail addresses.  I'm seeing {{email_ss = 
"?@[^],\,/^@[$_a-z]"}} for some documents (using Solr docs/ directory as the 
dataset)

  was:
There are several example/files improvements/fixes that are warranted:

* Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle 
brackets in field names), also add display of these fields in /browse results 
rendering
* Improve quality of extracted phrases
* Extract, facet, and display acronyms
* Add sorting controls, possibly all or some of these: last modified date, 
created date, relevancy, and title
* Add grouping by doc_type perhaps
* fix debug mode - currently does not update the parsed query debug output 
(this is probably a bug in data driven /browse as well)
* Harden update-script: it currently errors if documents do not have a 
"content" field (eg indexing basic CSV), but should instead skip extraction of 
e-mail addresses and URLs when no "content".  Not quite the use case (no 
"content") for example/files, but no reason to error in the update script at 
least.


> example/files improvements
> --
>
> Key: SOLR-8590
> URL: https://issues.apache.org/jira/browse/SOLR-8590
> Project: Solr
>  Issue Type: Bug
>  Components: examples
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
>Priority: Minor
> Fix For: 6.0
>
>
> There are several example/files improvements/fixes that are warranted:
> * Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle 
> brackets in field names), also add display of these fields in /browse results 
> rendering
> * Improve quality of extracted phrases
> * Extract, facet, and display acronyms
> * Add sorting controls, possibly all or some of these: last modified date, 
> created date, relevancy, and title
> * Add grouping by doc_type perhaps
> * fix debug mode - currently does not update the parsed query debug output 
> (this is probably a bug in data driven /browse as well)
> * Harden update-script: it currently errors if documents do not have a 
> "content" field (eg indexing basic CSV), but should instead skip extraction 
> of e-mail addresses and URLs when no "content".  Not quite the use case (no 
> "content") for example/files, but no reason to error in the update script at 
> least.
> * Filter out bogus e-mail addresses.  I'm seeing {{email_ss = 
> "?@[^],\,/^@[$_a-z]"}} for some documents (using Solr docs/ directory as the 
> dataset)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8590) example/files improvements

2016-01-26 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-8590:
---
Attachment: SOLR-8590.patch

This patch fixes the email_ss and url_ss field names, hardens the update script 
so "content" isn't required, and sets a fallback language and increase the 
threshold on language detection.

> example/files improvements
> --
>
> Key: SOLR-8590
> URL: https://issues.apache.org/jira/browse/SOLR-8590
> Project: Solr
>  Issue Type: Bug
>  Components: examples
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
>Priority: Minor
> Fix For: 6.0
>
> Attachments: SOLR-8590.patch
>
>
> There are several example/files improvements/fixes that are warranted:
> * Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle 
> brackets in field names), also add display of these fields in /browse results 
> rendering
> * Improve quality of extracted phrases
> * Extract, facet, and display acronyms
> * Add sorting controls, possibly all or some of these: last modified date, 
> created date, relevancy, and title
> * Add grouping by doc_type perhaps
> * fix debug mode - currently does not update the parsed query debug output 
> (this is probably a bug in data driven /browse as well)
> * Harden update-script: it currently errors if documents do not have a 
> "content" field (eg indexing basic CSV), but should instead skip extraction 
> of e-mail addresses and URLs when no "content".  Not quite the use case (no 
> "content") for example/files, but no reason to error in the update script at 
> least.
> * Filter out bogus e-mail addresses.  I'm seeing {{email_ss = 
> "?@[^],\,/^@[$_a-z]"}} for some documents (using Solr docs/ directory as the 
> dataset)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8590) example/files improvements

2016-01-25 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-8590:
---
Description: 
There are several example/files improvements/fixes that are warranted:

* Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle 
brackets in field names), also add display of these fields in /browse results 
rendering
* Improve quality of extracted phrases
* Extract, facet, and display acronyms
* Add sorting controls, possibly all or some of these: last modified date, 
created date, relevancy, and title
* Add grouping by doc_type perhaps
* fix debug mode - currently does not update the parsed query debug output 
(this is probably a bug in data driven /browse as well)

  was:
There are several example/files improvements/fixes that are warranted:

* Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle 
brackets in field names), also add display of these fields in /browse results 
rendering
* Improve quality of extracted phrases
* Extract, facet, and display acronyms
* Add sorting controls, possibly all or some of these: last modified date, 
created date, relevancy, and title
* Add grouping by doc_type perhaps


> example/files improvements
> --
>
> Key: SOLR-8590
> URL: https://issues.apache.org/jira/browse/SOLR-8590
> Project: Solr
>  Issue Type: Bug
>  Components: examples
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
>Priority: Minor
> Fix For: 6.0
>
>
> There are several example/files improvements/fixes that are warranted:
> * Fix e-mail and URL field names ({{_ss}} and {{_ss}}, with angle 
> brackets in field names), also add display of these fields in /browse results 
> rendering
> * Improve quality of extracted phrases
> * Extract, facet, and display acronyms
> * Add sorting controls, possibly all or some of these: last modified date, 
> created date, relevancy, and title
> * Add grouping by doc_type perhaps
> * fix debug mode - currently does not update the parsed query debug output 
> (this is probably a bug in data driven /browse as well)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org