date:20110427

[
https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Male updated LUCENE-3041:
---

Attachment: LUCENE-3041.patch

Updated patch.

This simplifies the hierarchy a lot. DispatchingQueryProcessor is merged into
QueryProcessor, which then becomes an abstract class. QueryProcessor now has
#dispatchProcessing(Query) which is the entry point to the dispatching process.

DefaultQueryProcessor is changed to RewriteCachingQueryProcessor which caches
the rewriting of querys. This could be extended further to provide special
support for BooleanQuery.

Remaining to do is to provide a test which illustrates walking through a
complex class.

Support Query Visting / Walking
---

Key: LUCENE-3041
URL: https://issues.apache.org/jira/browse/LUCENE-3041
Project: Lucene - Java
Issue Type: Improvement
Components: Search
Reporter: Chris Male
Priority: Minor
Attachments: LUCENE-3041.patch, LUCENE-3041.patch

Out of the discussion in LUCENE-2868, it could be useful to add a generic
Query Visitor / Walker that could be used for more advanced rewriting,
optimizations or anything that requires state to be stored as each Query is
visited.
We could keep the interface very simple:
{code}
public interface QueryVisitor {
Query visit(Query query);
}
{code}
and then use a reflection based visitor like Earwin suggested, which would
allow implementators to provide visit methods for just Querys that they are
interested in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking


[ 
https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025669#comment-13025669
 ] 

Simon Willnauer commented on LUCENE-3041:
-

Chris, nice simplification. I have one question, lets say we have a boolean 
query OR(AND(Fuzzy:A, Fuzzy:B), AND(Fuzzy A, Fuzzy:C)) how would it be possible 
with the current patch to reuse the rewrite for Fuzzy:A? As far as I can see If 
I don't rewrite the boolean query myself the current patch will rewrite the top 
level query and returns right? So somehow it must be possible to walk down the 
query ast.

or do I miss something?

 Support Query Visting / Walking
 ---

 Key: LUCENE-3041
 URL: https://issues.apache.org/jira/browse/LUCENE-3041
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Chris Male
Priority: Minor
 Attachments: LUCENE-3041.patch, LUCENE-3041.patch


 Out of the discussion in LUCENE-2868, it could be useful to add a generic 
 Query Visitor / Walker that could be used for more advanced rewriting, 
 optimizations or anything that requires state to be stored as each Query is 
 visited.
 We could keep the interface very simple:
 {code}
 public interface QueryVisitor {
   Query visit(Query query);
 }
 {code}
 and then use a reflection based visitor like Earwin suggested, which would 
 allow implementators to provide visit methods for just Querys that they are 
 interested in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking

[
https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025677#comment-13025677
]

Chris Male commented on LUCENE-3041:

No, you didn't miss something. The RewriteCachingQueryProcessor currently only
rewrites the top level query. It needs to be extended to handle BooleanQuerys
and any other composite query (BoostingQuery for example). I might actually
add a DefaultQueryProcessor again which walks the full Query AST by default.
Then get RewritingCachingQueryProcessor to extend and cache.

I'll iterate a new patch.

Support Query Visting / Walking
---

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3047) HyphenationCompoundWordTokenFilter does not work correctly with the german word Brustamputation

2011-04-27 Thread Lars Feistner (JIRA)

HyphenationCompoundWordTokenFilter does not work correctly with the german word 
Brustamputation
---

 Key: LUCENE-3047
 URL: https://issues.apache.org/jira/browse/LUCENE-3047
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 3.1
 Environment: Linux 2.6.32-31-generic
java version 1.6.0_21
Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
Reporter: Lars Feistner
Priority: Minor


Following Test fails:

@Test
public void testBrustamputation()
throws IOException {
Analyzer compoundAnalyzer = new Analyzer()
{
@Override
public TokenStream tokenStream( String fieldName, Reader reader ) {
InputStream in = this.getClass().getResourceAsStream( 
/de_DR.xml );

final InputSource inputSource = new InputSource( in );
inputSource.setEncoding( iso-8859-1 );
HyphenationTree hyphenator = null;
try {
hyphenator = 
HyphenationCompoundWordTokenFilter.getHyphenationTree( inputSource );
} catch ( Exception ex ) {
Assert.fail( , ex);
}
HashSet dict = new HashSet( Arrays.asList( new 
String[]{brust, amputation} ) );
return new HyphenationCompoundWordTokenFilter( 
Version.LUCENE_31, new WhitespaceTokenizer( Version.LUCENE_31, reader ), 
hyphenator,
dict, CompoundWordTokenFilterBase.DEFAULT_MIN_WORD_SIZE,
4, 
CompoundWordTokenFilterBase.DEFAULT_MAX_SUBWORD_SIZE, false );
}
};
TokenStream tokenStream = compoundAnalyzer.tokenStream( Kurztext, new 
StringReader( brustamputation ) );
CharTermAttribute t = tokenStream.addAttribute( CharTermAttribute.class 
);
SetString tokenSet = new HashSetString();
while ( tokenStream.incrementToken() ) {
tokenSet.add( t.toString() );
System.out.println( t );
}
Assert.assertTrue( tokenSet.contains( brust ), brust );
Assert.assertTrue( tokenSet.contains( brustamputation ), 
brustamputation );
Assert.assertTrue( tokenSet.contains( amputation ), amputation );

}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: modularization discussion

2011-04-27 Thread Michael McCandless

On Tue, Apr 26, 2011 at 11:34 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Tue, Apr 26, 2011 at 11:07 PM, Robert Muir rcm...@gmail.com wrote:
 It appears there are some problems with modularization of the code,
 especially between lucene and solr, so I would like for us to have a
 discussion on this thread.

 The specifics of each case matter of course.

I agree.

 Some of the refactored
 code has been changed to use the lucene namespace, and it
 seems only fair that other code that has traditionally been the
 domain of Solr keep the solr namespace. This helps
 keep the proper mindset that code is not being moved from
 solr to lucene as too many people keep putting it, but it's being
 exposed to lucene users and is now shared.

Why impose namespace restrictions based on where code was originally
committed?  I think the namespace of refactored code should reflect
the nature of the code, not its original origins?

For example, when I refactored UnInvertedField, it split nicely into a
Solr piece and a core Lucene piece, and so I gave the core Lucene
piece then org.apache.lucene.index namespace.

I think leaving refactored code in the solr namespace sends the wrong
message (ie, that this module depends on Solr somehow).  The lucene
namespace makes it clear that it only depends on Lucene.

Eg, the patch on LUCENE-2995 (consolidating our various spell/suggest
impls) also consolidates everything under the lucene namespace, which
I think makes sense?

Mike

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: modularization discussion

2011-04-27 Thread Michael McCandless

On Tue, Apr 26, 2011 at 11:41 PM, Grant Ingersoll gsing...@apache.org wrote:

 I think this needs a bit more explanation.  AIUI, the primary cause for 
 concern is that by making something a module, you are taking a private, 
 internal API of Solr's and now making it a public API that must be maintained 
 (and backwards maintained) which could slow down development as one now needs 
 to be concerned with more factors than you would if it were merely an 
 implementation detail in Solr.

This concern doesn't make sense to me: if we mark a module
experimental, we are fully free to change it, even drastically.

Pre-merge, I agree, it was a nightmare factoring code across
projects... but now that we are merged, and now that we have
@experimental, I don't understand this argument.

Maybe we can take a concrete example, eg LUCENE-2995 (factored out
suggest module): how does this being its own module hurt Solr?

Mike

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Filters with 2.9.4

2011-04-27 Thread Antony Bowesman


Hi Uwe,

Thanks for the reply.

Things are a bit tangled, because I've used early Solr stuff with DocSet and 
have extensively used my own caching Filters because I couldn't get what I 
wanted with the standard versions a few years ago.  It will take a while to undo 
that, but I'm working towards that.


However, it still seems to me that the Filter.getDocIdSet() method should also 
be given the docBase for the given reader.  It seems odd that the Collector has 
that knowledge but the Filter does not even though they are pretty closely 
related classes.


What do you think?
Antony



On 19/04/2011 5:01 PM, Uwe Schindler wrote:

Hi Antony,

Why not use CachingWrapperFilter together with a TermsFilter or
QueryWrapperFilter(TermQuery)? This Filter keeps track of all used segment
readers. So you build an instance:
  Filter f = new CachingWrapperFilter(new QueryWrapperFilter(new
TermQuery(new Term(...;

And reuse that filter instance with all queries, the user starts. No need to
hack the cache yourself. The above variant is much more effective as it
works better with reopen()'ed index readers (after index changed), because
it reuses the unchanged segment readers.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



-Original Message-
From: Antony Bowesman [mailto:a...@thorntothehorn.org]
Sent: Tuesday, April 19, 2011 7:30 AM
To: Lucene Dev
Subject: Filters with 2.9.4

Hi,

Another migrate to 2.9.4 issue for me...

When a search is done by a user, I collect a 'DocSet' of Documents for

that

'owner'  (Term(id, XX)).  This is a single set for all Documents in the

index

and NOT per reader.

Then when searches are made I use caching Filters, but I use my master
DocSet as a Filter for those chained Filters.  However, with 2.9, Filters

are

now called per segment reader and there's a DocIdSet for each Reader.
There is no way for the filter implementation to know the docBase for the
passed reader, like the collector does.

As the Javadocs for Filter.getDocIdSet imply, a Filter must only return

doc ids

for the given reader.

I am now stuck with a filter implementation that can no longer interset

the

master bitset for my 'owners'.

Was this envisaged during the changes and is there a way I can get hold of
the docBase for an IndexReader.

Thanks
Antony


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Filters with 2.9.4

2011-04-27 Thread Uwe Schindler

Hi,

In Lucene trunk the Filter gets a ReaderContext which contain a doc base if
available.

For Lucene 2 and 3 this is not available. The Lucene 2.9 code did not change
documented behavior. The fact that Filters always got the top level reader
was never documented (it was just like that in early Lucene versions) and so
is no break. The same applies not only to filters, it also applies to
Scorers created by Queries. Those also don't know anything about the
top-level searcher (and they don't need). For a filter to work this is also
not an requirement - the IndexReader passed as parameter is self contained
and provides all information for processing the current segment). You should
simply fix your caching (which is much more effective after this change, as
the cache items don't get invalid after a reopen of an index where only few
segments changed).

I would suggest to correct your code and use CachingWrapperFilter.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Antony Bowesman [mailto:a...@thorntothehorn.org]
 Sent: Wednesday, April 27, 2011 1:22 PM
 To: dev@lucene.apache.org
 Subject: Re: Filters with 2.9.4
 
 Hi Uwe,
 
 Thanks for the reply.
 
 Things are a bit tangled, because I've used early Solr stuff with DocSet
and
 have extensively used my own caching Filters because I couldn't get what I
 wanted with the standard versions a few years ago.  It will take a while
to
 undo that, but I'm working towards that.
 
 However, it still seems to me that the Filter.getDocIdSet() method should
 also be given the docBase for the given reader.  It seems odd that the
 Collector has that knowledge but the Filter does not even though they are
 pretty closely related classes.
 
 What do you think?
 Antony
 
 
 
 On 19/04/2011 5:01 PM, Uwe Schindler wrote:
  Hi Antony,
 
  Why not use CachingWrapperFilter together with a TermsFilter or
  QueryWrapperFilter(TermQuery)? This Filter keeps track of all used
  segment readers. So you build an instance:
Filter f = new CachingWrapperFilter(new QueryWrapperFilter(new
  TermQuery(new Term(...;
 
  And reuse that filter instance with all queries, the user starts. No
  need to hack the cache yourself. The above variant is much more
  effective as it works better with reopen()'ed index readers (after
  index changed), because it reuses the unchanged segment readers.
 
  Uwe
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Antony Bowesman [mailto:a...@thorntothehorn.org]
  Sent: Tuesday, April 19, 2011 7:30 AM
  To: Lucene Dev
  Subject: Filters with 2.9.4
 
  Hi,
 
  Another migrate to 2.9.4 issue for me...
 
  When a search is done by a user, I collect a 'DocSet' of Documents
  for
  that
  'owner'  (Term(id, XX)).  This is a single set for all Documents
  in the
  index
  and NOT per reader.
 
  Then when searches are made I use caching Filters, but I use my
  master DocSet as a Filter for those chained Filters.  However, with
  2.9, Filters
  are
  now called per segment reader and there's a DocIdSet for each Reader.
  There is no way for the filter implementation to know the docBase for
  the passed reader, like the collector does.
 
  As the Javadocs for Filter.getDocIdSet imply, a Filter must only
  return
  doc ids
  for the given reader.
 
  I am now stuck with a filter implementation that can no longer
  interset
  the
  master bitset for my 'owners'.
 
  Was this envisaged during the changes and is there a way I can get
  hold of the docBase for an IndexReader.
 
  Thanks
  Antony
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2479) Phrase (arbitrary delimiter) based autocomplete

2011-04-27 Thread Dawid Weiss (JIRA)

Phrase (arbitrary delimiter) based autocomplete
---

 Key: SOLR-2479
 URL: https://issues.apache.org/jira/browse/SOLR-2479
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.0


Much like the one described here by Google:

http://googleblog.blogspot.com/2011/04/more-predictions-in-autocomplete.html?utm_source=feedburnerutm_medium=feedutm_campaign=Feed%3A+blogspot%2FMKuf+%28Official+Google+Blog%29

My idea was to allow arbitrary delimiters -- then infix suggestions would also 
be possible (although these are _not_ of much practical importance and 
relatively few geeks would find them useful :).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: modularization discussion

2011-04-27 Thread Mark Miller


On Apr 27, 2011, at 12:14 AM, Robert Muir wrote:

 On Tue, Apr 26, 2011 at 11:41 PM, Grant Ingersoll gsing...@apache.org wrote:
 I think this needs a bit more explanation.  AIUI, the primary cause for 
 concern is that by making something a module, you are taking a private, 
 internal API of Solr's and now making it a public API that must be 
 maintained (and backwards maintained) which could slow down development as 
 one now needs to be concerned with more factors than you would if it were 
 merely an implementation detail in Solr.
 
 
 Can we solve this? It seems like for lucene users, they currently only
 have this choice:
 
 A. no access to feature X at all
 
 but, couldn't they at least have this choice:
 
 A. no access to feature X at all
 B. having access to some feature, but it has relaxed backwards
 compatibility to address the concern.
 
 In other words, we could mark the api @experimental or whatever, and
 the user can choose not to use it from a lucene level if they don't
 want to deal with upgrade hassles.

Honestly, too much fight too see the trees through the forrest.

Yonik has compromised down with pretty much every module brought up, that if 
its not stated as this feature is going to Lucene, if it goes to a module, if 
the module can have similar recs as the code had in Solr - that he's okay with 
it. To him it's very important that some of this stuff comes off as shared 
between Lucene/Solr and not just Lucene's. That's what I have gathered anyway. 
Fine by me.

My memory is that Yonik has never been stead fast against modules. He has tried 
to negotiate what he thinks is best in terms of this stuff. 

The break down comes from the personalities involved. Noone has been willing to 
swim to the end because it's hard work. Well some things are hard work. I say 
get used to it. I am.

The problem is that Simon says things like, everything should be a module and 
solr should just be sugar on Lucene. That scares Yonik. Then Yonik makes 
comments questioning individual modules. That scares the other guys. Both sides 
retreat to their corners.

Fantastic. Yes there is a middle ground - I've seen it swirl around and 
disappear back into the blood a few times. These volatile personalities are 
just not finding it.

- Mark


 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org






-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Filters with 2.9.4

2011-04-27 Thread Antony Bowesman


Thanks Uwe.  I'll work towards the CachingWrapperFilter.
Antony


On 27/04/2011 9:33 PM, Uwe Schindler wrote:

Hi,

In Lucene trunk the Filter gets a ReaderContext which contain a doc base if
available.

For Lucene 2 and 3 this is not available. The Lucene 2.9 code did not change
documented behavior. The fact that Filters always got the top level reader
was never documented (it was just like that in early Lucene versions) and so
is no break. The same applies not only to filters, it also applies to
Scorers created by Queries. Those also don't know anything about the
top-level searcher (and they don't need). For a filter to work this is also
not an requirement - the IndexReader passed as parameter is self contained
and provides all information for processing the current segment). You should
simply fix your caching (which is much more effective after this change, as
the cache items don't get invalid after a reopen of an index where only few
segments changed).

I would suggest to correct your code and use CachingWrapperFilter.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: modularization discussion

2011-04-27 Thread Robert Muir

On Wed, Apr 27, 2011 at 8:13 AM, Mark Miller markrmil...@gmail.com wrote:

 The problem is that Simon says things like, everything should be a module and 
 solr should just be sugar on Lucene. That scares Yonik. Then Yonik makes 
 comments questioning individual modules. That scares the other guys. Both 
 sides retreat to their corners.


why? In the best interest of the project, what are the reasons why
this a bad thing? Then users could access solr's features from the
API.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: bug in LuceneTestCase#TEST_MIN_ITER

2011-04-27 Thread Simon Willnauer

Fixed the behavior in Revision: 1097097


simon

On Tue, Apr 26, 2011 at 6:14 PM, Shai Erera ser...@gmail.com wrote:
 I think you're right Simon !

 Obviously I didn't test it with that scenario in mind :).

 Shai

 On Tue, Apr 26, 2011 at 6:15 PM, Simon Willnauer
 simon.willna...@googlemail.com wrote:

 hey I wonder how this TEST_MIN_ITER feature works though...
 I expect that if I set -Dtests.iter.min=1 -Dtests.iter=10 and I fail
 in any of those iterations that the the runner stops immediately and
 prints a failure. Is that correct?

 if so I don't understand this code:

  if (testsFailed) {
    lastIterFailed = i;
    if (i == TEST_ITER_MIN - 1) {
      if (verbose) {
        System.out.println(\nNOTE: iteration  + lastIterFailed +  failed
 !);
      }
      break;
   }
 }

 this only stops if it fails at tests.iter.min but not if it has failed
 test.iter.min+1
 This should rather be something like if(i =TEST_ITERM_MIN-1) right?

 simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: modularization discussion

2011-04-27 Thread Steven A Rowe

 if its not stated as this feature is going to Lucene

It seems as though some people assume that since Lucene is a library, and Solr 
is an application, that exposing Solr API *means* making it part of Lucene.  It 
ain't necessarily so, and it need not be a point of contention.

I want to reiterate my opinion (voiced pre-merge) that there be a third entity 
here besides Solr and Lucene.

E.g., if modules/ became thirdentity/, with its own org.apache.thirdentity 
namespace, wouldn't questions of ownership/control mostly go away?

Steve

Re: modularization discussion

2011-04-27 Thread Yonik Seeley

On Wed, Apr 27, 2011 at 6:28 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 Why impose namespace restrictions based on where code was originally
 committed?  I think the namespace of refactored code should reflect
 the nature of the code, not its original origins?

And if it's a very core part of solr that we've tended to hang a lot of
new features on, etc, then the nature of that code should still
hopefully be solrish.

 For example, when I refactored UnInvertedField, it split nicely into a
 Solr piece and a core Lucene piece, and so I gave the core Lucene
 piece then org.apache.lucene.index namespace.

That's because it was factored directly into Lucene-core, not into a module.

 I think leaving refactored code in the solr namespace sends the wrong
 message (ie, that this module depends on Solr somehow).  The lucene
 namespace makes it clear that it only depends on Lucene.

But that won't be true... it's likely that many modules will depend on other
modules.

But as I said... it seems only fair to meet half way and use the solr namespace
for some modules and the lucene namespace for others.

-Yonik

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-236) Field collapsing

2011-04-27 Thread Matthias Otto (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Otto updated SOLR-236:
---

Comment: was deleted

(was: Am I right that trunk is 4.0? What is the newest patch that works on that 
code? All patches I tried so far failed for me.
Also, would someone we able to share a solr.WAR file that is already patched 
and fairly up-to-date?

Thanks)

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: Next

 Attachments: DocSetScoreCollector.java, 
 NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
 SOLR-236-1_4_1-NPEfix.patch, SOLR-236-1_4_1-paging-totals-working.patch, 
 SOLR-236-1_4_1.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-branch_3x.patch, SOLR-236-distinctFacet.patch, SOLR-236-trunk.patch, 
 SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
 SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch, 
 collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, 
 collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, solr-236.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: modularization discussion

2011-04-27 Thread Steven A Rowe

On 4/27/2011 at 9:25 AM, Yonik wrote:
 it seems only fair to meet half way and use the solr namespace
 for some modules and the lucene namespace for others.

Let's eliminate a source of conflict, and make modules another product that is 
neither Lucene nor Solr.

Steve

[jira] [Resolved] (SOLR-2272) Join

2011-04-27 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-2272.


Resolution: Fixed

 Join
 

 Key: SOLR-2272
 URL: https://issues.apache.org/jira/browse/SOLR-2272
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: SOLR-2272.patch, SOLR-2272.patch, SOLR-2272.patch


 Limited join functionality for Solr, mapping one set of IDs matching a query 
 to another set of IDs, based on the indexed tokens of the fields.
 Example:
 fq={!join  from=parent_ptr to:parent_id}child_doc:query

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3048) Improve BooleanQuery rewrite documentation

Improve BooleanQuery rewrite documentation
--

 Key: LUCENE-3048
 URL: https://issues.apache.org/jira/browse/LUCENE-3048
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Chris Male
Priority: Minor


While looking over BooleanQuery#rewrite, I found a couple of things confusing.  
Why, in the case of a single clause, is the boost set as it is, and whats going 
on with the lazy initialisation of the cloned BooleanQuery.  I'm just adding a 
few lines of documentation to both situations to clarify this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3048) Improve BooleanQuery rewrite documentation


 [ 
https://issues.apache.org/jira/browse/LUCENE-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3048:
---

Attachment: LUCENE-3048.patch

Patch adding comments as mentioned

 Improve BooleanQuery rewrite documentation
 --

 Key: LUCENE-3048
 URL: https://issues.apache.org/jira/browse/LUCENE-3048
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Chris Male
Priority: Minor
 Attachments: LUCENE-3048.patch


 While looking over BooleanQuery#rewrite, I found a couple of things 
 confusing.  Why, in the case of a single clause, is the boost set as it is, 
 and whats going on with the lazy initialisation of the cloned BooleanQuery.  
 I'm just adding a few lines of documentation to both situations to clarify 
 this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3048) Improve BooleanQuery rewrite documentation


 [ 
https://issues.apache.org/jira/browse/LUCENE-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-3048:
---

Assignee: Simon Willnauer

 Improve BooleanQuery rewrite documentation
 --

 Key: LUCENE-3048
 URL: https://issues.apache.org/jira/browse/LUCENE-3048
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Chris Male
Assignee: Simon Willnauer
Priority: Minor
 Attachments: LUCENE-3048.patch


 While looking over BooleanQuery#rewrite, I found a couple of things 
 confusing.  Why, in the case of a single clause, is the boost set as it is, 
 and whats going on with the lazy initialisation of the cloned BooleanQuery.  
 I'm just adding a few lines of documentation to both situations to clarify 
 this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3048) Improve BooleanQuery rewrite documentation


[ 
https://issues.apache.org/jira/browse/LUCENE-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025819#comment-13025819
 ] 

Simon Willnauer commented on LUCENE-3048:
-

looks useful chris! I will commit it, thanks!

 Improve BooleanQuery rewrite documentation
 --

 Key: LUCENE-3048
 URL: https://issues.apache.org/jira/browse/LUCENE-3048
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Chris Male
Assignee: Simon Willnauer
Priority: Minor
 Attachments: LUCENE-3048.patch


 While looking over BooleanQuery#rewrite, I found a couple of things 
 confusing.  Why, in the case of a single clause, is the boost set as it is, 
 and whats going on with the lazy initialisation of the cloned BooleanQuery.  
 I'm just adding a few lines of documentation to both situations to clarify 
 this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3048) Improve BooleanQuery rewrite documentation


 [ 
https://issues.apache.org/jira/browse/LUCENE-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3048.
-

Resolution: Fixed

 Improve BooleanQuery rewrite documentation
 --

 Key: LUCENE-3048
 URL: https://issues.apache.org/jira/browse/LUCENE-3048
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Chris Male
Assignee: Simon Willnauer
Priority: Minor
 Attachments: LUCENE-3048.patch


 While looking over BooleanQuery#rewrite, I found a couple of things 
 confusing.  Why, in the case of a single clause, is the boost set as it is, 
 and whats going on with the lazy initialisation of the cloned BooleanQuery.  
 I'm just adding a few lines of documentation to both situations to clarify 
 this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: modularization discussion

2011-04-27 Thread Michael McCandless

On Wed, Apr 27, 2011 at 9:25 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Wed, Apr 27, 2011 at 6:28 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Why impose namespace restrictions based on where code was originally
 committed?  I think the namespace of refactored code should reflect
 the nature of the code, not its original origins?

 And if it's a very core part of solr that we've tended to hang a lot of
 new features on, etc, then the nature of that code should still
 hopefully be solrish.

I'm confused... aren't they all solrish?  Like, of the refactorings
on the table, which ones are not solrish?

Is the real issue here that you want Solr's name to live on no matter
how this code is refactored in the future?

 For example, when I refactored UnInvertedField, it split nicely into a
 Solr piece and a core Lucene piece, and so I gave the core Lucene
 piece then org.apache.lucene.index namespace.

 That's because it was factored directly into Lucene-core, not into a module.

OK.

 I think leaving refactored code in the solr namespace sends the wrong
 message (ie, that this module depends on Solr somehow).  The lucene
 namespace makes it clear that it only depends on Lucene.

 But that won't be true... it's likely that many modules will depend on other
 modules.

Sure but that's fine?  Each layer can depend on other stuff in its
layer, or in stuff in the lower (more core) layers.  Solr depends on
Solr stuff and modules and Lucene core.  Modules depend on other
modules an Lucene core.

 But as I said... it seems only fair to meet half way and use the solr 
 namespace
 for some modules and the lucene namespace for others.

Actually I think a whole new namespace (Steven's suggestion) is a
great idea?  Would that work?  (Else we'll be arguing on every module
refactoring what namespace it should take...).

Or, I would also be fine with naming all modules factored out of solr
under the solr namespace, as long as we make it clear that you can use
them w/o the rest of Solr.

Are there other (technical) objections to ongoing refactoring besides
this namespace problem?

Mike

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [Lucene.Net] Lucene.NET 2.9.4g -- only usable with .NET 4.0 ?

2011-04-27 Thread Digy

Sorry, for now, only 4.0.
DIGY

-Original Message-
From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] 
Sent: Wednesday, April 27, 2011 6:06 PM
To: lucene-net-...@lucene.apache.org
Subject: [Lucene.Net] Lucene.NET 2.9.4g -- only usable with .NET 4.0 ?

Digy,

Am I correct that your trial code changes make this version of Lucene.NET
incompatible and un-buildable with any version of .NET prior to 4.0?

- Neal

Re: [Lucene.Net] Lucene.NET 2.9.4g -- only usable with .NET 4.0 ?

2011-04-27 Thread Robert Jordan


On 27.04.2011 17:40, Amanuel Workneh wrote:

Am I correct that your trial code changes make this version of Lucene.NET 
incompatible and un-buildable with any version of .NET prior to 4.0?


As I understand it, 2.9.4g only replaces non-generic collections with
generic ones. Generics was introduced in .NET Framework 2.0.

Oh, sorry, I took a look at the code just to make sure. It does use
SortedSet, a .NET 4 feature. It also uses HashSet, introduced in .NET
3.5.


We could get a copy of these classes from the Mono project:

4.0 collection classes:

https://github.com/mono/mono/tree/master/mcs/class/System/System.Collections.Generic

3.5 collection classes:

https://github.com/mono/mono/tree/master/mcs/class/System.Core/System.Collections.Generic

They are licensed under the MIT/X11 license, which should
be compatible with ASF's policy.

Robert

[jira] [Created] (LUCENE-3049) NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)

NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)
-

 Key: LUCENE-3049
 URL: https://issues.apache.org/jira/browse/LUCENE-3049
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Jonathan Young


Calling HHMMSegmenter.process() on a string which is longer than 32767 
characters will usually result in a NullPointerException being thrown with the 
following backtrace:

java.lang.NullPointerException
at 
org.apache.lucene.analysis.cn.smart.hhmm.BiSegGraph.getShortPath(BiSegGraph.java:190)
at 
org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter.process(HHMMSegmenter.java:208)

The root cause is the declaration of index as a _short_ at line 77 of 
modules/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java
 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3049) NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Young updated LUCENE-3049:
---

Lucene Fields: [New, Patch Available]  (was: [New])

Patch attached.

 NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)
 -

 Key: LUCENE-3049
 URL: https://issues.apache.org/jira/browse/LUCENE-3049
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Jonathan Young
   Original Estimate: 1h
  Remaining Estimate: 1h

 Calling HHMMSegmenter.process() on a string which is longer than 32767 
 characters will usually result in a NullPointerException being thrown with 
 the following backtrace:
 java.lang.NullPointerException
   at 
 org.apache.lucene.analysis.cn.smart.hhmm.BiSegGraph.getShortPath(BiSegGraph.java:190)
   at 
 org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter.process(HHMMSegmenter.java:208)
 The root cause is the declaration of index as a _short_ at line 77 of 
 modules/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java
  .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7488 - Failure

2011-04-27 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7488/

1 tests failed.
REGRESSION:  org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2894)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:589)
at java.lang.StringBuffer.append(StringBuffer.java:337)
at 
java.text.RuleBasedCollator.getCollationKey(RuleBasedCollator.java:617)
at 
org.apache.lucene.collation.CollationKeyFilter.incrementToken(CollationKeyFilter.java:93)
at 
org.apache.lucene.collation.CollationTestBase.assertThreadSafe(CollationTestBase.java:304)
at 
org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe(TestCollationKeyAnalyzer.java:89)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1097)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1025)




Build Log (for compile errors):
[...truncated 5263 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3049) NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)

2011-04-27 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025927#comment-13025927
 ] 

Steven Rowe commented on LUCENE-3049:
-

Jonathan, FYI, you didn't attach a patch?

 NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)
 -

 Key: LUCENE-3049
 URL: https://issues.apache.org/jira/browse/LUCENE-3049
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Jonathan Young
   Original Estimate: 1h
  Remaining Estimate: 1h

 Calling HHMMSegmenter.process() on a string which is longer than 32767 
 characters will usually result in a NullPointerException being thrown with 
 the following backtrace:
 java.lang.NullPointerException
   at 
 org.apache.lucene.analysis.cn.smart.hhmm.BiSegGraph.getShortPath(BiSegGraph.java:190)
   at 
 org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter.process(HHMMSegmenter.java:208)
 The root cause is the declaration of index as a _short_ at line 77 of 
 modules/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java
  .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3049) NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Young resolved LUCENE-3049.


   Resolution: Duplicate
Lucene Fields: [New]  (was: [Patch Available, New])

Recently fixed at revision 1092328.

 NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)
 -

 Key: LUCENE-3049
 URL: https://issues.apache.org/jira/browse/LUCENE-3049
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Jonathan Young
   Original Estimate: 1h
  Remaining Estimate: 1h

 Calling HHMMSegmenter.process() on a string which is longer than 32767 
 characters will usually result in a NullPointerException being thrown with 
 the following backtrace:
 java.lang.NullPointerException
   at 
 org.apache.lucene.analysis.cn.smart.hhmm.BiSegGraph.getShortPath(BiSegGraph.java:190)
   at 
 org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter.process(HHMMSegmenter.java:208)
 The root cause is the declaration of index as a _short_ at line 77 of 
 modules/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java
  .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3049) NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)


[ 
https://issues.apache.org/jira/browse/LUCENE-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025929#comment-13025929
 ] 

Jonathan Young edited comment on LUCENE-3049 at 4/27/11 6:17 PM:
-

In preparing the patch, I updated, and then discovered it had already been 
recently fixed at revision 1092328.

  was (Author: jyoung):
Recently fixed at revision 1092328.
  
 NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)
 -

 Key: LUCENE-3049
 URL: https://issues.apache.org/jira/browse/LUCENE-3049
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Jonathan Young
   Original Estimate: 1h
  Remaining Estimate: 1h

 Calling HHMMSegmenter.process() on a string which is longer than 32767 
 characters will usually result in a NullPointerException being thrown with 
 the following backtrace:
 java.lang.NullPointerException
   at 
 org.apache.lucene.analysis.cn.smart.hhmm.BiSegGraph.getShortPath(BiSegGraph.java:190)
   at 
 org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter.process(HHMMSegmenter.java:208)
 The root cause is the declaration of index as a _short_ at line 77 of 
 modules/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java
  .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3049) NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Young updated LUCENE-3049:
---

Comment: was deleted

(was: Patch attached.)

 NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)
 -

 Key: LUCENE-3049
 URL: https://issues.apache.org/jira/browse/LUCENE-3049
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Jonathan Young
   Original Estimate: 1h
  Remaining Estimate: 1h

 Calling HHMMSegmenter.process() on a string which is longer than 32767 
 characters will usually result in a NullPointerException being thrown with 
 the following backtrace:
 java.lang.NullPointerException
   at 
 org.apache.lucene.analysis.cn.smart.hhmm.BiSegGraph.getShortPath(BiSegGraph.java:190)
   at 
 org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter.process(HHMMSegmenter.java:208)
 The root cause is the declaration of index as a _short_ at line 77 of 
 modules/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java
  .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2400) FieldAnalysisRequestHandler; add information about token-relation

2011-04-27 Thread Stefan Matheis (steffkes) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025958#comment-13025958
]

Stefan Matheis (steffkes) commented on SOLR-2400:
-

Yes =) Ty Uwe, applied the Patch: works perfectly! I've tried splitting on
Words, also removing of Stopwords - both are looking good.
Will see how we could integrate this -- actually for the normal languages an
their analysis .. afterwords for the Japanase one :)

FieldAnalysisRequestHandler; add information about token-relation
-

Key: SOLR-2400
URL: https://issues.apache.org/jira/browse/SOLR-2400
Project: Solr
Issue Type: Improvement
Components: Schema and Analysis
Reporter: Stefan Matheis (steffkes)
Priority: Minor
Attachments: 110303_FieldAnalysisRequestHandler_output.xml,
110303_FieldAnalysisRequestHandler_view.png, SOLR-2400.patch,
SOLR-2400.patch, field.xml

The XML-Output (simplified example attached) is missing one small information
.. which could be very useful to build an nice Analysis-Output, and that's
Token-Relation (if there is special/correct word for this, please correct
me).
Meaning, that is actually not possible to follow the Analysis-Process
(completly) while the Tokenizers/Filters will drop out Tokens (f.e. StopWord)
or split it into multiple Tokens (f.e. WordDelimiter).
Would it be possible to include this Information? If so, it would be possible
to create an improved Analysis-Page for the new Solr Admin (SOLR-2399) -
short scribble attached

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3023) Land DWPT on trunk

[
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025961#comment-13025961
]

Robert Muir commented on LUCENE-3023:
-

I was helping Simon look at reintegrating this branch (produce a patch for easy
review, etc), but I found some problems.

1. it looks like some commits were marked as merged from trunk, but not
actually merged. so if we reintegrate into trunk we will lose some changes.
2. some files have lost their svn:eol-style, which makes the comparison
difficult.

I'm looking at these issues now.

Land DWPT on trunk
--

Key: LUCENE-3023
URL: https://issues.apache.org/jira/browse/LUCENE-3023
Project: Lucene - Java
Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Fix For: 4.0

Attachments: LUCENE-3023.patch, realtime-TestAddIndexes-3.txt,
realtime-TestAddIndexes-5.txt,
realtime-TestIndexWriterExceptions-assert-6.txt,
realtime-TestIndexWriterExceptions-npe-1.txt,
realtime-TestIndexWriterExceptions-npe-2.txt,
realtime-TestIndexWriterExceptions-npe-4.txt,
realtime-TestOmitTf-corrupt-0.txt

With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so
we can proceed landing the DWPT development on trunk soon. I think one of the
bigger issues here is to make sure that all JavaDocs for IW etc. are still
correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7501 - Failure

2011-04-27 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7501/

1 tests failed.
REGRESSION:  org.apache.solr.client.solrj.TestLBHttpSolrServer.testSimple

Error Message:
expected:3 but was:2

Stack Trace:
junit.framework.AssertionFailedError: expected:3 but was:2
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)
at 
org.apache.solr.client.solrj.TestLBHttpSolrServer.testSimple(TestLBHttpSolrServer.java:127)




Build Log (for compile errors):
[...truncated 9069 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3023) Land DWPT on trunk


 [ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3023:


Attachment: diffMccand.py

ok, i think these issues are resolved. I'm attaching the script Mike wrote that 
I used for checking that we don't lose any changes (I think its the same script 
we used for the flex branch).

the way I did it is to checkout a/ and b/, reintegrate the branch into b/, and 
run the script to produce a huge patch. if some things look suspicious like 
they are lost changes, then i reverse apply the huge patch to the branch with 
eclipse and only selectively apply those lost changes and then commit.


 Land DWPT on trunk
 --

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3023.patch, diffMccand.py, 
 realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, 
 realtime-TestIndexWriterExceptions-assert-6.txt, 
 realtime-TestIndexWriterExceptions-npe-1.txt, 
 realtime-TestIndexWriterExceptions-npe-2.txt, 
 realtime-TestIndexWriterExceptions-npe-4.txt, 
 realtime-TestOmitTf-corrupt-0.txt


 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
 we can proceed landing the DWPT development on trunk soon. I think one of the 
 bigger issues here is to make sure that all JavaDocs for IW etc. are still 
 correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: modularization discussion

2011-04-27 Thread Yonik Seeley

On Wed, Apr 27, 2011 at 11:49 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 On Wed, Apr 27, 2011 at 9:25 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Wed, Apr 27, 2011 at 6:28 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Why impose namespace restrictions based on where code was originally
 committed?  I think the namespace of refactored code should reflect
 the nature of the code, not its original origins?

 And if it's a very core part of solr that we've tended to hang a lot of
 new features on, etc, then the nature of that code should still
 hopefully be solrish.

 I'm confused... aren't they all solrish?  Like, of the refactorings
 on the table, which ones are not solrish?

The benchmarking stuff definitely originated in lucene-land, there was
much more lucene analysis than solr analysis in that module consolidation,
and non-sandboxish stuff in lucene-contrib that may be refactored/moved
to modules.

 Is the real issue here that you want Solr's name to live on no matter
 how this code is refactored in the future?

 For example, when I refactored UnInvertedField, it split nicely into a
 Solr piece and a core Lucene piece, and so I gave the core Lucene
 piece then org.apache.lucene.index namespace.

 That's because it was factored directly into Lucene-core, not into a module.

 OK.

 I think leaving refactored code in the solr namespace sends the wrong
 message (ie, that this module depends on Solr somehow).  The lucene
 namespace makes it clear that it only depends on Lucene.

 But that won't be true... it's likely that many modules will depend on other
 modules.

 Sure but that's fine?  Each layer can depend on other stuff in its
 layer, or in stuff in the lower (more core) layers.  Solr depends on
 Solr stuff and modules and Lucene core.  Modules depend on other
 modules an Lucene core.

But my point was the namespace doesn't tell you what the dependencies
of the modules are.  lucene wouldn't mean that it depends on lucene-core
only... (and depending what it is, may not depend on lucene-core at all)
and solr wouldn't mean that it depends on solr-core.

 But as I said... it seems only fair to meet half way and use the solr 
 namespace
 for some modules and the lucene namespace for others.

 Actually I think a whole new namespace (Steven's suggestion) is a
 great idea?  Would that work?  (Else we'll be arguing on every module
 refactoring what namespace it should take...).

 Or, I would also be fine with naming all modules factored out of solr
 under the solr namespace, as long as we make it clear that you can use
 them w/o the rest of Solr.

Of course!  That's the whole point of refactoring a module out of some
solr functionality.
Actual dependencies (i.e. which modules depend on which modules) would
be TBD of course.

 Are there other (technical) objections to ongoing refactoring besides
 this namespace problem?

I don't think so in general - as I stated before, w.r.t. LUCENE-2883,
later discussions
led me to believe there was very little disagreement left (and I
actually thought
some of us had come to an agreement).

-Yonik

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3023) Land DWPT on trunk


 [ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3023:


Attachment: LUCENE-3023.patch

Attached is the DWPT branch in patch format against trunk (for easier 
reviewing).


 Land DWPT on trunk
 --

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, diffMccand.py, 
 realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, 
 realtime-TestIndexWriterExceptions-assert-6.txt, 
 realtime-TestIndexWriterExceptions-npe-1.txt, 
 realtime-TestIndexWriterExceptions-npe-2.txt, 
 realtime-TestIndexWriterExceptions-npe-4.txt, 
 realtime-TestOmitTf-corrupt-0.txt


 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
 we can proceed landing the DWPT development on trunk soon. I think one of the 
 bigger issues here is to make sure that all JavaDocs for IW etc. are still 
 correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3023) Land DWPT on trunk


[ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026069#comment-13026069
 ] 

Robert Muir commented on LUCENE-3023:
-

What about TestIndexWriter.testIndexingThenDeleting?

I noticed in the branch the test method is changed to _testIndexingThenDeleting 
(disabled).

However, if i re-enable it (rename it back) it never seems to finish...

 Land DWPT on trunk
 --

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, diffMccand.py, 
 realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, 
 realtime-TestIndexWriterExceptions-assert-6.txt, 
 realtime-TestIndexWriterExceptions-npe-1.txt, 
 realtime-TestIndexWriterExceptions-npe-2.txt, 
 realtime-TestIndexWriterExceptions-npe-4.txt, 
 realtime-TestOmitTf-corrupt-0.txt


 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
 we can proceed landing the DWPT development on trunk soon. I think one of the 
 bigger issues here is to make sure that all JavaDocs for IW etc. are still 
 correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2242) Get distinct count of names for a facet field


[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024966#comment-13024966
 ] 

Lance Norskog edited comment on SOLR-2242 at 4/28/11 2:01 AM:
--

From the patch:
bq. {{public static final String FACET_NAMEDISTINCT = FACET + 
.numFacetTerms;}}
So- in this issue, a _name_ is what everything else calls a _term_, and a 
_value_ is what everyone else calls a _count of documents with *this term* in 
*this field*_. Please change this in the patch.






  was (Author: lancenorskog):
From the patch:
bq. {{public static final String FACET_NAMEDISTINCT = FACET + 
.numFacetTerms;}}
So- in this issue, a _name_ is what everything else calls a _term_. Please 
change this in the patch.





  
 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Congratulations!

2011-04-27 Thread Adriano Crestani

Hi Phillipe,

Congrats, I am looking forward to start working with you too ;)

On Tue, Apr 26, 2011 at 8:40 PM, Mark Miller markrmil...@gmail.com wrote:

 Congrats Phillipe! We are very excited to have you! Your proposal sounds
 great.

 - Mark

 On Apr 26, 2011, at 8:31 PM, Phillipe Ramalho wrote:

  Hi everyone,
 
  It seems my project was accepted, I am looking forward to start coding
 for Lucene.
 
  Thanks!
 
  -
  Phillipe Ramalho
 
  -- Forwarded message --
  From: no-re...@socghop.appspotmail.com
  Date: Mon, Apr 25, 2011 at 2:48 PM
  Subject: Congratulations!
  To: phillipe.rama...@gmail.com
 
 
  Dear Phillipe,
 
 
  Congratulations! Your proposal Lucene-2979: Simplify configuration API
 of contrib Query Parser as submitted to Apache Software Foundation has
 been accepted for Google Summer of Code 2011. Over the next few days, we
 will add you to the private Google Summer of Code Student Discussion List.
 Over the next few weeks, we will send instructions to this list regarding
 turn in proof of enrollment, tax forms, etc.
 
  Now that you've been accepted, please take the opportunity to speak with
 your mentors about plans for the Community Bonding Period: what
 documentation should you be reading, what version control system will you
 need to set up, etc., before start of coding begins on May 23rd.
 
  Welcome to Google Summer of Code 2011! We look forward to having you with
 us.
 
 
  With best regards,
  The Google Summer of Code Program Administration Team
 
 
 
 
  --
  Phillipe Ramalho

 - Mark Miller
 lucidimagination.com

 Lucene/Solr User Conference
 May 25-26, San Francisco
 www.lucenerevolution.org






 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field

2011-04-27 Thread James Dyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2242:
-

Attachment: SOLR-2242.patch

I noticed that with the original patch applied, SimpleFacetsTest would fail.  
The reason is a tiny bug that affects backwards-compatibility in that this 
would wrap the counts with a counts element in the response.  This is valid 
if using the namedistinct param, but if a user doesn't specify this, it 
shouldn't affect old behavior.  This updated patch corrects this little issue 
and SimpleFacetsTest now passes. 

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR.2242.solr3.1.patch, 
 SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3041) Support Query Visting / Walking

[
https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Male updated LUCENE-3041:
---

Attachment: LUCENE-3041.patch

A much larger patch that implements full query AST walking.

The problem with having the QueryProcessor fully external to Query#rewrite, is
that composite Querys would need to expose their children. This is a little
messy and could be hard with more exotic user-made Querys.

So this patch basically expands Query#rewrite to include the QueryProcessor.
Composite queries can then pass their children to the processor during their
rewrite.

For backwards compat, and simplicity, I've created a SimpleQueryProcessor which
directly calls rewrite. This means casual users do not need to concern
themselves with processing.

Overtime we can expose the QueryProcessor API through IndexSearcher and other
situations.

Support Query Visting / Walking
---

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2011-04-27 Thread Bill Bell (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026103#comment-13026103
 ] 

Bill Bell commented on SOLR-2242:
-

Lance Norskog,

What do you want it to be called? I would use a committer to take this issue 
on. It has several votes, and lots of downloads. People are using it 
successfully already.

Do you want me to switch the numFacetTerms to numFacetNames ? Anything else? I 
feel like we are going in circles on this issue.

{code}

This will output the numFacetTerms AND hgid:
http://localhost:8983/solr/select?q=*:*facet=truefacet.field=hgidfacet.mincount=1f.hgid.facet.numFacetTerms=2

lst name=facet_fields
  lst name=hgid
   int name=numFacetTerms7/int  !-- this is not 11 --
   lst name=counts
int name=HGPY045FD36D4000A1/int
int name=HGPY0FBC6690453A91/int
int name=HGPY1E44ED6C4FB3B1/int
int name=HGPY1FA631034A1B81/int
int name=HGPY3317ABAC43B481/int
int name=HGPY3A17B2294CB5A5/int
int name=HGPY3ADD2B3D48C391/int
   /lst
  /lst
/lst

{code}

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR.2242.solr3.1.patch, 
 SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2242) Get distinct count of names for a facet field

2011-04-27 Thread Bill Bell (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026103#comment-13026103
 ] 

Bill Bell edited comment on SOLR-2242 at 4/28/11 3:51 AM:
--

Lance Norskog,

What do you want it to be called? I would use a committer to take this issue 
on. It has several votes, and lots of downloads. People are using it 
successfully already.

Do you want me to switch the numFacetTerms to numFacetNames ? Anything else? I 
feel like we are going in circles on this issue.

{code}

This will output the numFacetTerms AND hgid:
http://localhost:8983/solr/select?q=*:*facet=truefacet.field=hgidfacet.mincount=1f.hgid.facet.numFacetNames=2

lst name=facet_fields
  lst name=hgid
   int name=numFacetNames7/int  !-- this is not 11 --
   lst name=counts
int name=HGPY045FD36D4000A1/int
int name=HGPY0FBC6690453A91/int
int name=HGPY1E44ED6C4FB3B1/int
int name=HGPY1FA631034A1B81/int
int name=HGPY3317ABAC43B481/int
int name=HGPY3A17B2294CB5A5/int
int name=HGPY3ADD2B3D48C391/int
   /lst
  /lst
/lst

{code}

  was (Author: billnbell):
Lance Norskog,

What do you want it to be called? I would use a committer to take this issue 
on. It has several votes, and lots of downloads. People are using it 
successfully already.

Do you want me to switch the numFacetTerms to numFacetNames ? Anything else? I 
feel like we are going in circles on this issue.

{code}

This will output the numFacetTerms AND hgid:
http://localhost:8983/solr/select?q=*:*facet=truefacet.field=hgidfacet.mincount=1f.hgid.facet.numFacetTerms=2

lst name=facet_fields
  lst name=hgid
   int name=numFacetTerms7/int  !-- this is not 11 --
   lst name=counts
int name=HGPY045FD36D4000A1/int
int name=HGPY0FBC6690453A91/int
int name=HGPY1E44ED6C4FB3B1/int
int name=HGPY1FA631034A1B81/int
int name=HGPY3317ABAC43B481/int
int name=HGPY3A17B2294CB5A5/int
int name=HGPY3ADD2B3D48C391/int
   /lst
  /lst
/lst

{code}
  
 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR.2242.solr3.1.patch, 
 SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3041) Support Query Visting / Walking


 [ 
https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3041:
---

Attachment: LUCENE-3041.patch

Updated patch which removes the stupid test I'd included

 Support Query Visting / Walking
 ---

 Key: LUCENE-3041
 URL: https://issues.apache.org/jira/browse/LUCENE-3041
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Chris Male
Priority: Minor
 Attachments: LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch, 
 LUCENE-3041.patch


 Out of the discussion in LUCENE-2868, it could be useful to add a generic 
 Query Visitor / Walker that could be used for more advanced rewriting, 
 optimizations or anything that requires state to be stored as each Query is 
 visited.
 We could keep the interface very simple:
 {code}
 public interface QueryVisitor {
   Query visit(Query query);
 }
 {code}
 and then use a reflection based visitor like Earwin suggested, which would 
 allow implementators to provide visit methods for just Querys that they are 
 interested in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field


 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lance Norskog updated SOLR-2242:


Attachment: SOLR-2242.solr3.1.patch

Putting up or shutting up :)


 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2242) Get distinct count of names for a facet field


[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026124#comment-13026124
 ] 

Lance Norskog edited comment on SOLR-2242 at 4/28/11 5:33 AM:
--

Putting up or shutting up :)

This splits apart whether to count terms v.s. whether to count docs per term. 
They are independent concepts.

Instead of 'numFacetTerms=0/1/2' it is 'numTerms=true/false'.
if you set 'numTerms=true', it counts terms.
If you set facet.limit=0, it does not do the facet search. It does not count 
docs per term.
If you set 'numTerms=false' and 'facet.limit=0', it does nothing.

And, everything is called 'facet' and 'term' :)


  was (Author: lancenorskog):
Putting up or shutting up :)

  
 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2242) Get distinct count of names for a facet field


[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026124#comment-13026124
 ] 

Lance Norskog edited comment on SOLR-2242 at 4/28/11 5:33 AM:
--

Putting up or shutting up :)

This splits apart whether to count terms v.s. whether to count docs per term. 
They are independent concepts.

Instead of 'numFacetTerms=0/1/2' it is 'numTerms=true/false'.
if you set 'numTerms=true', it counts terms.
If you set facet.limit=0, it does not do the facet search. It does not count 
docs per term.
If you set 'numTerms=false' and 'facet.limit=0', it does nothing.

'numFacetTerms' is redundant- we know it's all about facets. Thus, 'numTerms'.


  was (Author: lancenorskog):
Putting up or shutting up :)

This splits apart whether to count terms v.s. whether to count docs per term. 
They are independent concepts.

Instead of 'numFacetTerms=0/1/2' it is 'numTerms=true/false'.
if you set 'numTerms=true', it counts terms.
If you set facet.limit=0, it does not do the facet search. It does not count 
docs per term.
If you set 'numTerms=false' and 'facet.limit=0', it does nothing.

And, everything is called 'facet' and 'term' :)

  
 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking