subject:"\[jira\] \[Comment Edited\] \(LUCENE\-5317\) Concordance capability"

[jira] [Comment Edited] (LUCENE-5317) Concordance capability

2016-09-26 Thread Tim Allison (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523067#comment-15523067
]

Tim Allison edited comment on LUCENE-5317 at 9/26/16 1:40 PM:
--

I received a personal email asking for some more background on this capability.
Here goes (apologies for some repetition with the issue description)...

For an example of concordance output, see these
[slides|https://github.com/tballison/share/blob/master/slides/TextProcessingAndAdvancedSearch_tallison_MITRE_201510_final_abbrev.pdf].
Slides 23 and 24 for LUCENE-5317 and slides 25-28 for LUCENE-5318.

The notion is that you present every time the term appears in the central
column with {{x}} number of words to the left and right. The user can sort on
words before the target term to see what modifies it, or the user can sort on
words after the target term to see what it modifies, or the user can sort on
order of appearance within the documents to effectively read everything in
their docs that matters to them.

By {{target term}}, of course, I mean any term/phrase that can be represented
by a SpanQuery.

This kind of view of the data is extremely helpful to linguists and
philologists to understand how words are being used. It also has practical
applications for anyone doing "analytic" search, that is, they want to see
every time a term/phrase appears -- lawyers, patent examiners, etc.

This view of the data is fundamentally different from snippets, which typically
show the three or so best chunks where the search terms appear, and they're
typically ordered _per document_. Snippets allow the user to determine if a
document is relevant, then the user has to open the document. Snippets are
great if users are seeking the best document to answer their information need.

For "analytic searchers", however, with concordance results, the user can be
saved the step of having to open the document; they can see _every time_ their
term/phrase appears. Also, for "analytic searchers", if their documents are
lengthy, the concordance allows them to see the potentially hundreds of times
that their term/phrase appears in each document instead of the three or so
snippets they might see with traditional search engines.

"But you can increase the number of snippets to whatever you want..." Yes, you
can, but the layout of the concordance allows you to see patterns across
documents very easily. Again, the results are sorted by words to the left or
right, not by which document the target appeared in.

This [link|https://wmtang.org/corpus-linguistics/corpus-linguistics] shows some
output from a concordancer (AntConc). Wikipedia's best description is under
key word in context ([KWIC|https://en.wikipedia.org/wiki/Key_Word_in_Context]).
If you're into tree-ware,
[Oakes|https://global.oup.com/academic/product/statistics-for-corpus-linguistics-9780748608171?cc=us=en;]
has a great introduction to concordances among many other useful topics!

was (Author: talli...@mitre.org):
I received a personal email asking for some more background on this capability.
Here goes (apologies for some repetition with the issue description)...

By {{target term}}, of course, I mean any term/phrase that can be represented
by a SpanQuery.

This view of the data is fundamentally different from snippets, which typically
show the three or so best chunks where the search terms appear. Snippets allow
the user to determine if a document is relevant, then the user has to open the
document. Snippets are great if the user is seeking the best document to
answer the information need. For "analytic searchers", however, with
concordance results, the user can be saved the step of having to open the
document; they can see _every time_ their term/phrase appears. Also, for
"analytic searchers", if their documents are lengthy, the concordance allows
them to see the potentially hundreds of

[jira] [Comment Edited] (LUCENE-5317) Concordance capability

2016-09-26 Thread Tim Allison (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523067#comment-15523067
]

Tim Allison edited comment on LUCENE-5317 at 9/26/16 1:38 PM:
--

I received a personal email asking for some more background on this capability.
Here goes (apologies for some repetition with the issue description)...

By {{target term}}, of course, I mean any term/phrase that can be represented
by a SpanQuery.

was (Author: talli...@mitre.org):
I received a personal email asking for some more background on this capability.
Here goes (apologies for some repetition with the issue description)...

By {{target term}}, of course, I mean any term/phrase that can be represented
by a SpanQuery.

[jira] [Comment Edited] (LUCENE-5317) Concordance capability

[jira] [Comment Edited] (LUCENE-5317) Concordance capability

2 matches

Site Navigation

Mail list logo

Footer information