[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-13 Thread Stefan Trcek (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455134#comment-13455134
 ] 

Stefan Trcek commented on LUCENE-4369:
--

Not just a rename and I don't know if it is viable:

The idea is: you start thinking about analyzing when adding fields
for some purpose, not when creating an IndexWriter. And the mode how to do it 
is tightened to the field.

How about to dismiss the Analyzer in IndexWriter/Config
and add all analyzing information to Field, something like

new TextField(...) // as keyword
new TextField(..., Analyzer, AnalyzingMode) // analyzed

or better

new TextField(..., AnalyzingMode.AS_IS) // as keyword
new TextField(..., new AnalyzingMode(Analyzer, ...)) // analyzed
new TextField(..., AnalyzingMode.STANDARD) // sugar

Then in the public API for IndexWriter there may be no need to use
- PerFieldAnalyzerWrapper
- Field.Index.NO
- KeywordAnalyzer

This also answers the not so easy question why and how to construct a
(field aware) analyzer as a parameter for IndexWriter/Config.


> StringFields name is unintuitive and not helpful
> 
>
> Key: LUCENE-4369
> URL: https://issues.apache.org/jira/browse/LUCENE-4369
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-4369.patch
>
>
> There's a huge difference between TextField and StringField, StringField 
> screws up scoring and bypasses your Analyzer.
> (see java-user thread "Custom Analyzer Not Called When Indexing" as an 
> example.)
> The name we use here is vital, otherwise people will get bad results.
> I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2617) Support git.

2011-06-24 Thread Stefan Trcek (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054302#comment-13054302
 ] 

Stefan Trcek commented on SOLR-2617:


> For .gitignore, I prefer to generate it automatically assuming the git repo 
> is git-svn based, however that didn't work

As a git mirror is sufficient to make patches I suggest to add .gitignore to 
the repo, as this enables the use of a git mirror without git-svn.


> Support git.
> 
>
> Key: SOLR-2617
> URL: https://issues.apache.org/jira/browse/SOLR-2617
> Project: Solr
>  Issue Type: New Feature
>  Components: Build
>Reporter: David Smiley
>
> Apache has git mirrors of Lucene/Solr, as well as many other projects. 
> Presently, if git is used to checkout Lucene/Solr, there are only a couple 
> small problems to address, but it otherwise works fine.
> * a .gitignore is needed.
> * empty directories need to be dealt-with. (git doesn't support them)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-22 Thread Stefan Trcek (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053361#comment-13053361
 ] 

Stefan Trcek commented on LUCENE-3079:
--

This patch was generated by git and tested to apply with
patch -p0 -i LUCENE-3079.patch --dry-run
Be patient if anything went wrong.

Review starting points may be
- FacetSearcherTest.testSimpleFacetWithIndexSearcher() or
- FacetSearcher.facetCollectSearch()

Functions.java may be  dismissed in favor of Guava.
If you are willing to keep it I'll strip it down to the required parts.

--

The implementation relies on field cache only, no index scheme, no 
cached filters etc. It supports
- single valued facets (Facet.java)
- multi valued facets (Facet.MultiValued.java)
- facet filters (see FacetSearcher.java)
- evaluation of facet values that would dismiss due to other facet 
filters (Yonik says Solr calls this "multi-select faceting").
(realized by FacetSearcher.fillFacetsForGuiMode())

Let me explain the last point: For the user a facet query
  (color==green) AND (shape==circle OR shape==square)
may look like

Facet color
[ ] (3) red
[x] (5) green
[ ] (7) blue

Facet shape
[x] (9) circle
[ ] (4) line
[x] (2) square

The red/blue/line facet values will display even though the 
corresponding documents are not in the result set. Also there is 
support for filtered facet values with zero results, so users 
understand why they do not get results.


> Facetiing module
> 
>
> Key: LUCENE-3079
> URL: https://issues.apache.org/jira/browse/LUCENE-3079
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Attachments: LUCENE-3079.patch
>
>
> Faceting is a hugely important feature, available in Solr today but
> not [easily] usable by Lucene-only apps.
> We should fix this, by creating a shared faceting module.
> Ideally, we factor out Solr's faceting impl, and maybe poach/merge
> from other impls (eg Bobo browse).
> Hoss describes some important challenges we'll face in doing this
> (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
> {noformat}
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> {noformat}
> This is a great list of the things we face in refactoring.  It's also
> important because, if Solr needed to be so deeply intertwined with
> caching, schema, etc., other apps that want to facet will have the
> same "needs" and so we really have to address them in creating the
> shared module.
> I think we should get a basic faceting module started, but should not
> cut Solr over at first.  We should iterate on the module, fold in
> improvements, etc., and then, once we can fully verify that cutting
> over doesn't hurt Solr (ie lose functionality or performance) we can
> later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3079) Facetiing module

2011-06-22 Thread Stefan Trcek (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Trcek updated LUCENE-3079:
-

Attachment: LUCENE-3079.patch

> Facetiing module
> 
>
> Key: LUCENE-3079
> URL: https://issues.apache.org/jira/browse/LUCENE-3079
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Attachments: LUCENE-3079.patch
>
>
> Faceting is a hugely important feature, available in Solr today but
> not [easily] usable by Lucene-only apps.
> We should fix this, by creating a shared faceting module.
> Ideally, we factor out Solr's faceting impl, and maybe poach/merge
> from other impls (eg Bobo browse).
> Hoss describes some important challenges we'll face in doing this
> (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
> {noformat}
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> {noformat}
> This is a great list of the things we face in refactoring.  It's also
> important because, if Solr needed to be so deeply intertwined with
> caching, schema, etc., other apps that want to facet will have the
> same "needs" and so we really have to address them in creating the
> shared module.
> I think we should get a basic faceting module started, but should not
> cut Solr over at first.  We should iterate on the module, fold in
> improvements, etc., and then, once we can fully verify that cutting
> over doesn't hurt Solr (ie lose functionality or performance) we can
> later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org