date:20130718


 [ 
https://issues.apache.org/jira/browse/LUCENE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4734:
-

Attachment: LUCENE-4734.patch

Ryan, I iterated over your patch in order to be able to handle a few more 
queries, specifically phrase queries that contain gaps or have several terms at 
the same position.

It is very hard to handle all possibilities without making the highlighting 
complexity explode. I'm looking forward to LUCENE-2878 so that highlighting can 
be more efficient and doesn't need to duplicate the query interpretation logic 
anymore.

 FastVectorHighlighter Overlapping Proximity Queries Do Not Highlight
 

 Key: LUCENE-4734
 URL: https://issues.apache.org/jira/browse/LUCENE-4734
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.0, 4.1, 5.0
Reporter: Ryan Lauck
  Labels: fastvectorhighlighter, highlighter
 Fix For: 4.4

 Attachments: lucene-4734.patch, LUCENE-4734.patch


 If a proximity phrase query overlaps with any other query term it will not be 
 highlighted.
 Example Text:  A B C D E F G
 Example Queries: 
 B E~10 D
 (D will be highlighted instead of B C D E)
 B E~10 C F~10
 (nothing will be highlighted)
 This can be traced to the FieldPhraseList constructor's inner while loop. 
 From the first example query, the first TermInfo popped off the stack will be 
 B. The second TermInfo will be D which will not be found in the submap 
 for B E~10 and will trigger a failed match.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (LUCENE-4118) FastVectorHighlighter fail to highlight taking in input some proximity query.


 [ 
https://issues.apache.org/jira/browse/LUCENE-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand closed LUCENE-4118.


Resolution: Duplicate

Duplicate of LUCENE-4734

 FastVectorHighlighter fail to highlight taking in input some proximity query.
 -

 Key: LUCENE-4118
 URL: https://issues.apache.org/jira/browse/LUCENE-4118
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 3.4, 5.0
Reporter: Emanuele Lombardi
Assignee: Koji Sekiguchi
  Labels: FastVectorHighlighter
 Attachments: FVHPatch.txt


 There are 2 related bug with proximity query
 1) In a phrase there are n repeated terms the FVH module fails to highlight 
 that.
 see testRepeatedTermsWithSlop
 2) If you search the terms reversed the FVH module fails to highlight that.
 see testReversedTermsWithSlop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer


[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712241#comment-13712241
 ] 

Markus Jelsma commented on SOLR-4816:
-

Joel, it is working perfectly and already runs fine in one production 
environment. It's about 30% more efficient when sending data from 20 Hadoop 
reducers to 10 Solr SSD nodes using routing than the current method. We didn't 
implement the routable deletes - we're still using SolrServer.deleteById(), 
seems UpdateRequestExt is not going to be the definitive API to talk to, right?

I assume it won't make it in 4.4 but we should make an effort to get committed 
to trunk and/or 4.5 some day soon.

 Add document routing to CloudSolrServer
 ---

 Key: SOLR-4816
 URL: https://issues.apache.org/jira/browse/SOLR-4816
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.3
Reporter: Joel Bernstein
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch


 This issue adds the following enhancements to CloudSolrServer's update logic:
 1) Document routing: Updates are routed directly to the correct shard leader 
 eliminating document routing at the server.
 2) Optional parallel update execution: Updates for each shard are executed in 
 a separate thread so parallel indexing can occur across the cluster.
 These enhancements should allow for near linear scalability on indexing 
 throughput.
 Usage:
 CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
 cloudClient.setParallelUpdates(true); 
 SolrInputDocument doc1 = new SolrInputDocument();
 doc1.addField(id, 0);
 doc1.addField(a_t, hello1);
 SolrInputDocument doc2 = new SolrInputDocument();
 doc2.addField(id, 2);
 doc2.addField(a_t, hello2);
 UpdateRequest request = new UpdateRequest();
 request.add(doc1);
 request.add(doc2);
 request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
 NamedList response = cloudClient.request(request); // Returns a backwards 
 compatible condensed response.
 //To get more detailed response down cast to RouteResponse:
 CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

4.0 and 4.1 FieldCacheImpl.DocTermsImpl.exists(docid) possibly broken

2013-07-18 Thread Doron Cohen

Hi, just an FYI - may be helpful for anyone obliged to use 4.0.0 or 4.1.0 -
it seems that this method is actually doing the opposite of its intention.

I did not find mentions of this in the lists or elsewhere.

This is the code for o.a.l.search.FieldCacheImpl.DocTermsImpl.exists(int):
public boolean exists(int docID) {
  return docToOffset.get(docID) == 0;
}

Its description says: Returns true if this doc has this field and is not
deleted.
But it returns true for docs not containing the field and false for those
that do contain it.

A simple workaround is to not to call this method before calling getTerm()
but rather just rely on getTerm()  logic: ... returns the same BytesRef,
or an empty (length=0) BytesRef if the doc did not have this field or was
deleted.

So usage code can be like this:
DocTerms values =  FieldCache.DEFAULT.getTerms(reader, FIELD_NAME);
 BytesRef term = new BytesRef();
for (int docid=0; docidvalues.size(); docid++) {
term = values.getTerm(docid, term);
 if (term.length0) {
doSomethingWith(term.utf8ToString());
}
 }
FieldCache.DEFAULT.purge(reader);

I am not sure about the overhead of this comparing to first checking
exists(), but it at least work correctly.

The code for exists() was as above until R1442497 (
http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java?revision=1442497view=markup)
and then in R1443717 (
http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java?r1=1442497r2=1443717diff_format=h)
the API was change as part of LUCENE-4547 (DocValues improvements) which
was included in 4.2.

Simple code to demonstrate this (here with 4.1 but same results with 4.0):

RAMDirectory d = new RAMDirectory();
IndexWriter w = new IndexWriter(d, new IndexWriterConfig(Version.LUCENE_41,
new SimpleAnalyzer(Version.LUCENE_41)));
 w.addDocument(new Document()); // Empty doc (0, 0)
 Document doc = new Document(); // Real doc (1, 1)
doc.add(new StringField(f1, v1, Store.NO));
w.addDocument(doc);
 w.addDocument(new Document()); // Empty doc (2, 2)
w.addDocument(new Document()); // Empty doc (3, 3)
w.commit(); // Commit - so we'll have two atomic readers
 doc = new Document(); // RealDoc (0, 4)
doc.add(new StringField(f1, v2, Store.NO));
w.addDocument(doc);
w.addDocument(new Document()); // Empty doc (1, 5)
w.close();

IndexReader r = DirectoryReader.open(d);
BytesRef br = new BytesRef();
for (AtomicReaderContext leaf : r.leaves()) {
System.out.println(--- new atomic reader);
AtomicReader reader = leaf.reader();
DocTerms a = FieldCache.DEFAULT.getTerms(reader, f1);
for (int i = 0; i  reader.maxDoc(); ++i) {
int n = leaf.docBase + i;
System.out.println(n+ exists: +a.exists(i));
br = a.getTerm(i, br);
if (br.length  0) {
System.out.println(n +   +br.utf8ToString());
}
}
}

The result printing:

  --- new atomic reader
  0 exists: true
  1 exists: false
  1  v1
  2 exists: true
  3 exists: true
  --- new atomic reader
 4 exists: false
 4  v2
 5 exists: true

Indeed, exists() results are wrong.

So again, just an FYI, as this API no longer exists...

Regards,
Doron

[jira] [Updated] (LUCENE-5091) Modify SpanNotQuery to act as SpanNotNearQuery too

2013-07-18 Thread Tim Allison (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated LUCENE-5091:


Fix Version/s: (was: 4.4)
   4.5

 Modify SpanNotQuery to act as SpanNotNearQuery too
 --

 Key: LUCENE-5091
 URL: https://issues.apache.org/jira/browse/LUCENE-5091
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: 4.3.1
Reporter: Tim Allison
Priority: Minor
 Fix For: 4.5

 Attachments: LUCENE-5091.patch.txt


 With very small modifications, SpanNotQuery can act as a SpanNotNearQuery.
 To find a but not if b appears 3 tokens before or 4 tokens after a:
 new SpanNotQuery(a, b, 3, 4)
 Original constructor still exists and calls SpanNotQuery(a, b, 0, 0).
 Patch with tests on way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5091) Modify SpanNotQuery to act as SpanNotNearQuery too

2013-07-18 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712253#comment-13712253
 ] 

Tim Allison commented on LUCENE-5091:
-

With the push for 4.4 on, I've moved this to 4.5.  If someone has a chance to 
review this, that'd be great.  Thank you!

 Modify SpanNotQuery to act as SpanNotNearQuery too
 --

 Key: LUCENE-5091
 URL: https://issues.apache.org/jira/browse/LUCENE-5091
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: 4.3.1
Reporter: Tim Allison
Priority: Minor
 Fix For: 4.5

 Attachments: LUCENE-5091.patch.txt


 With very small modifications, SpanNotQuery can act as a SpanNotNearQuery.
 To find a but not if b appears 3 tokens before or 4 tokens after a:
 new SpanNotQuery(a, b, 3, 4)
 Original constructor still exists and calls SpanNotQuery(a, b, 0, 0).
 Patch with tests on way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

Your best bet is to preprocess queries and expand synonyms in your own 
application layer. The Lucene/Solr synonym implementation, design, and 
architecture is fairly lightweight (although FST is a big improvement) and not 
architected for large and dynamic synonym sets.

Do you need multi-word phrase synonyms as well, or is this strictly single-word 
synonyms?

-- Jack Krupansky

From: Shai Erera 
Sent: Thursday, July 18, 2013 1:36 AM
To: dev@lucene.apache.org 
Subject: Programmatic Synonyms Filter (Lucene and/or Solr)

Hi


I was asked to integrate with a system which provides synonyms for words 
through API. I checked the existing synonym filters in Lucene and Solr and they 
all seem to take a synonyms map up front. 

E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's 
not really programmatic in the sense that I can provide an impl which will pull 
the synonyms through the other system's API.


Solr SynonymFilterFactory just loads the synonyms from a file into a 
SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can 
extend that one either.


The problem is that the synonyms DB I should integrate with is HUGE and will 
probably not fit in RAM (SynonymMap). Nor is it currently possible to pull all 
available synonyms from it in one go. The API I have is something like String[] 
getSynonyms(String word).


So I have few questions:


1) Did I miss a Filter which does take a programmatic syn-map which I can 
provide my own impl to?


2) If not, Would it make sense to modify SynonymMap to offer getSynonyms(word) 
API (using BytesRef / CharsRef of course), with an FSTSynonymMap default impl 
so that users can provide their own impl, e.g. not requiring everything to be 
in RAM?


2.1) Side-effect benefit, I think, is that we won't require everyone to deal 
with the FST API that way, though I'll admit I cannot think of may use cases 
for not using SynonymFilter as-is ...


3) If the answer to (1) and (2) is NO, I guess my only option is to implement 
my own SynonymFilter, copying most of the code from Lucene's ... right?

Shai

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Shai Erera

The examples I've seen so far are single words. But I learned today
something new .. the number of synonyms returned for a word may be in the
range of hundreds, sometimes even thousands.
So I'm not sure query-time synonyms may work at all .. what do you think?

Shai


On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky j...@basetechnology.comwrote:

   Your best bet is to preprocess queries and expand synonyms in your own
 application layer. The Lucene/Solr synonym implementation, design, and
 architecture is fairly lightweight (although FST is a big improvement) and
 not architected for large and dynamic synonym sets.

 Do you need multi-word phrase synonyms as well, or is this strictly
 single-word synonyms?

 -- Jack Krupansky

  *From:* Shai Erera ser...@gmail.com
 *Sent:* Thursday, July 18, 2013 1:36 AM
 *To:* dev@lucene.apache.org
 *Subject:* Programmatic Synonyms Filter (Lucene and/or Solr)

 Hi

 I was asked to integrate with a system which provides synonyms for words
 through API. I checked the existing synonym filters in Lucene and Solr and
 they all seem to take a synonyms map up front.

 E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so
 it's not really programmatic in the sense that I can provide an impl which
 will pull the synonyms through the other system's API.

 Solr SynonymFilterFactory just loads the synonyms from a file into a
 SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I
 can extend that one either.

 The problem is that the synonyms DB I should integrate with is HUGE and
 will probably not fit in RAM (SynonymMap). Nor is it currently possible to
 pull all available synonyms from it in one go. The API I have is something
 like String[] getSynonyms(String word).

 So I have few questions:

 1) Did I miss a Filter which does take a programmatic syn-map which I can
 provide my own impl to?

 2) If not, Would it make sense to modify SynonymMap to offer
 getSynonyms(word) API (using BytesRef / CharsRef of course), with an
 FSTSynonymMap default impl so that users can provide their own impl, e.g.
 not requiring everything to be in RAM?

 2.1) Side-effect benefit, I think, is that we won't require everyone to
 deal with the FST API that way, though I'll admit I cannot think of may use
 cases for not using SynonymFilter as-is ...

 3) If the answer to (1) and (2) is NO, I guess my only option is to
 implement my own SynonymFilter, copying most of the code from Lucene's ...
 right?

 Shai

[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica

[
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712297#comment-13712297
]

Markus Jelsma commented on SOLR-4260:
-

FYI: we're still seeing major inconsistencies, facet counts are off and when
inspecting leaders and replica's we notice not all are in sync. This is on
yesterday's trunk and with an empty index. There were no node failures during
indexing. Shard_b's stats for example:

node 2 shard b
{code}
Last Modified:about a minute ago
Num Docs:158964
Max Doc:158964
Deleted Docs:0
Version:4479
Segment Count:1
{code}

node 3 shard b
{code}
Last Modified:2 minutes ago
Num Docs:158298
Max Doc:158298
Deleted Docs:0
Version:2886
{code}

Size and versions are also different. Cluster is optimized/forceMerged but
doesn't change the facts as expected. At least one other shard also has
differences in its two replica's, i haven't manually checked the others.

Inconsistent numDocs between leader and replica
---

Key: SOLR-4260
URL: https://issues.apache.org/jira/browse/SOLR-4260
Project: Solr
Issue Type: Bug
Components: SolrCloud
Affects Versions: 5.0
Environment: 5.0.0.2013.01.04.15.31.51
Reporter: Markus Jelsma
Priority: Critical
Fix For: 5.0

After wiping all cores and reindexing some 3.3 million docs from Nutch using
CloudSolrServer we see inconsistencies between the leader and replica for
some shards.
Each core hold about 3.3k documents. For some reason 5 out of 10 shards have
a small deviation in then number of documents. The leader and slave deviate
for roughly 10-20 documents, not more.
Results hopping ranks in the result set for identical queries got my
attention, there were small IDF differences for exactly the same record
causing a record to shift positions in the result set. During those tests no
records were indexed. Consecutive catch all queries also return different
number of numDocs.
We're running a 10 node test cluster with 10 shards and a replication factor
of two and frequently reindex using a fresh build from trunk. I've not seen
this issue for quite some time until a few days ago.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: for those of you using gmail...

2013-07-18 Thread Michael McCandless

On Wed, Jul 17, 2013 at 10:02 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : And seems that it returns no result for query:
 :
 :   from:jenk...@thetaphi.de subject:build 6605  ANY_WORD_NOT_IN_TITLE
 :
 : Maybe for some mails, only title field are taken into consideration?

 Ah ... interesting.  I wonder if maybe something about the content of the
 jenkins emails + the multipart/mixed wrapping done by exmlm occasionally
 causes gmail to balk at trying to parse the various parts of *some*
 jenkins emails (like maybe just he ones with multi-byte characters?)  so
 you are left with only the headers being searchable?

 That still doesn't explain this descrepancy though...

Maybe it's because build 6605 was a biiig email (1.9 MB), and Google
punted indexing any text whatsoever from its body?  I see this in the
build 6605 email:

   [Message clipped]  View entire message

 :  from:jenk...@thetaphi.de regression
 : 
 :  I only get results up to Jul 2, even though there are many build
 :  failures after that.
 :
 : A recent search got results up to #6530. Still no 6605.

 mike says the newest email gmail will return from that serach is Jul 2,
 but Han, myself, (and if IIRC several other people) are all seeing lots of
 results since then ... just not all of them, notably the specific one
 Mike asked about (Lucene-Solr-trunk-Linux (64bit/jdk1.8.0-ea-b96) - Build
 # 660 sent 13 hours ago)

In fact, now when I run this search, I'm seeing additional results
after Jul 2!  And, they seem to be the smaller emails.

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

Maybe a custom search component would be in order, to “enrich” the incoming 
query. Again, preprocessing the query for synonym expansion before Solr parses 
it. It could call the external synonym API and cache synonyms as well.

But, I’d still lean towards preprocessing in an application layer. Although, 
for hundreds or thousands of synonyms it would probably hit the 2048 common 
limit for URLs in some containers, which would need to be raised.

-- Jack Krupansky

From: Shai Erera 
Sent: Thursday, July 18, 2013 8:54 AM
To: dev@lucene.apache.org 
Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr)

The examples I've seen so far are single words. But I learned today something 
new .. the number of synonyms returned for a word may be in the range of 
hundreds, sometimes even thousands.

So I'm not sure query-time synonyms may work at all .. what do you think?

Shai




On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky j...@basetechnology.com wrote:

  Your best bet is to preprocess queries and expand synonyms in your own 
application layer. The Lucene/Solr synonym implementation, design, and 
architecture is fairly lightweight (although FST is a big improvement) and not 
architected for large and dynamic synonym sets.

  Do you need multi-word phrase synonyms as well, or is this strictly 
single-word synonyms?

  -- Jack Krupansky

  From: Shai Erera 
  Sent: Thursday, July 18, 2013 1:36 AM
  To: dev@lucene.apache.org 
  Subject: Programmatic Synonyms Filter (Lucene and/or Solr)

  Hi


  I was asked to integrate with a system which provides synonyms for words 
through API. I checked the existing synonym filters in Lucene and Solr and they 
all seem to take a synonyms map up front. 

  E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's 
not really programmatic in the sense that I can provide an impl which will pull 
the synonyms through the other system's API.


  Solr SynonymFilterFactory just loads the synonyms from a file into a 
SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can 
extend that one either.


  The problem is that the synonyms DB I should integrate with is HUGE and will 
probably not fit in RAM (SynonymMap). Nor is it currently possible to pull all 
available synonyms from it in one go. The API I have is something like String[] 
getSynonyms(String word).


  So I have few questions:


  1) Did I miss a Filter which does take a programmatic syn-map which I can 
provide my own impl to?


  2) If not, Would it make sense to modify SynonymMap to offer 
getSynonyms(word) API (using BytesRef / CharsRef of course), with an 
FSTSynonymMap default impl so that users can provide their own impl, e.g. not 
requiring everything to be in RAM?


  2.1) Side-effect benefit, I think, is that we won't require everyone to deal 
with the FST API that way, though I'll admit I cannot think of may use cases 
for not using SynonymFilter as-is ...


  3) If the answer to (1) and (2) is NO, I guess my only option is to implement 
my own SynonymFilter, copying most of the code from Lucene's ... right?

  Shai

[jira] [Created] (SOLR-5047) Color Shard/Collection Graph Nodes Based on Child Node Statuses

2013-07-18 Thread Thomas Murphy (JIRA)

Thomas Murphy created SOLR-5047:
---

 Summary: Color Shard/Collection Graph Nodes Based on Child Node 
Statuses
 Key: SOLR-5047
 URL: https://issues.apache.org/jira/browse/SOLR-5047
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Thomas Murphy
Priority: Trivial


In the Solr Admin UI, only the leaf (individual core) nodes have colored 
statuses, leaving collections and shards as no-context nodes. Having status 
information for collections and shards would improve the ability for an 
administrator to recognize which collections and shards are influenced by 
server downtime on certain cores.

With increasing severity, the current core statuses are: Active, Recovering, 
Down, Recovery Failed, Gone

The simplest plan:
* shards inherit the best status of their cores, one functioning core of that 
shard implies that the shard is functional
* collections inherit the worst status of their shards, one missing shard 
implies that the collection is not able to access data

More complicated, but accurate, would be appropriate indication of partially 
failed shards and their influence on the total health of the collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer


[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712301#comment-13712301
 ] 

Joel Bernstein commented on SOLR-4816:
--

Markus, thanks for the info. Glad to hear it's working for you in production. 

Just wondering if you've turned on parallel updates and what batch size you're 
using?

I'm thinking that large batch sizes with parallel updates would be very 
beneficial for performance. That way you would get long stretches of parallel 
indexing across the cluster.

I suspect that UpdateRequestExt will eventually get folded into UpdateRequest 
based on the comments in the source.

I'll ping Mark and see what he thinks about getting this committed.  

 Add document routing to CloudSolrServer
 ---

 Key: SOLR-4816
 URL: https://issues.apache.org/jira/browse/SOLR-4816
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.3
Reporter: Joel Bernstein
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch


 This issue adds the following enhancements to CloudSolrServer's update logic:
 1) Document routing: Updates are routed directly to the correct shard leader 
 eliminating document routing at the server.
 2) Optional parallel update execution: Updates for each shard are executed in 
 a separate thread so parallel indexing can occur across the cluster.
 These enhancements should allow for near linear scalability on indexing 
 throughput.
 Usage:
 CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
 cloudClient.setParallelUpdates(true); 
 SolrInputDocument doc1 = new SolrInputDocument();
 doc1.addField(id, 0);
 doc1.addField(a_t, hello1);
 SolrInputDocument doc2 = new SolrInputDocument();
 doc2.addField(id, 2);
 doc2.addField(a_t, hello2);
 UpdateRequest request = new UpdateRequest();
 request.add(doc1);
 request.add(doc2);
 request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
 NamedList response = cloudClient.request(request); // Returns a backwards 
 compatible condensed response.
 //To get more detailed response down cast to RouteResponse:
 CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica

2013-07-18 Thread Mark Miller (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712300#comment-13712300
]

Mark Miller commented on SOLR-4260:
---

See anything in the logs about zk expirations?

Inconsistent numDocs between leader and replica
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica

[
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712314#comment-13712314
]

Markus Jelsma commented on SOLR-4260:
-

I've already restarted the job and enabled logging! It's going to take a while
:)

Inconsistent numDocs between leader and replica
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer


[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712318#comment-13712318
 ] 

Markus Jelsma commented on SOLR-4816:
-

Batch size is about 394 iirc, not very large indeed. I don't think i enabled 
parallel updates. 

 Add document routing to CloudSolrServer
 ---

 Key: SOLR-4816
 URL: https://issues.apache.org/jira/browse/SOLR-4816
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.3
Reporter: Joel Bernstein
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch


 This issue adds the following enhancements to CloudSolrServer's update logic:
 1) Document routing: Updates are routed directly to the correct shard leader 
 eliminating document routing at the server.
 2) Optional parallel update execution: Updates for each shard are executed in 
 a separate thread so parallel indexing can occur across the cluster.
 These enhancements should allow for near linear scalability on indexing 
 throughput.
 Usage:
 CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
 cloudClient.setParallelUpdates(true); 
 SolrInputDocument doc1 = new SolrInputDocument();
 doc1.addField(id, 0);
 doc1.addField(a_t, hello1);
 SolrInputDocument doc2 = new SolrInputDocument();
 doc2.addField(id, 2);
 doc2.addField(a_t, hello2);
 UpdateRequest request = new UpdateRequest();
 request.add(doc1);
 request.add(doc2);
 request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
 NamedList response = cloudClient.request(request); // Returns a backwards 
 compatible condensed response.
 //To get more detailed response down cast to RouteResponse:
 CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer


[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712321#comment-13712321
 ] 

Joel Bernstein commented on SOLR-4816:
--

Thanks Mark.


 Add document routing to CloudSolrServer
 ---

 Key: SOLR-4816
 URL: https://issues.apache.org/jira/browse/SOLR-4816
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.3
Reporter: Joel Bernstein
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0, 4.5

 Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch


 This issue adds the following enhancements to CloudSolrServer's update logic:
 1) Document routing: Updates are routed directly to the correct shard leader 
 eliminating document routing at the server.
 2) Optional parallel update execution: Updates for each shard are executed in 
 a separate thread so parallel indexing can occur across the cluster.
 These enhancements should allow for near linear scalability on indexing 
 throughput.
 Usage:
 CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
 cloudClient.setParallelUpdates(true); 
 SolrInputDocument doc1 = new SolrInputDocument();
 doc1.addField(id, 0);
 doc1.addField(a_t, hello1);
 SolrInputDocument doc2 = new SolrInputDocument();
 doc2.addField(id, 2);
 doc2.addField(a_t, hello2);
 UpdateRequest request = new UpdateRequest();
 request.add(doc1);
 request.add(doc2);
 request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
 NamedList response = cloudClient.request(request); // Returns a backwards 
 compatible condensed response.
 //To get more detailed response down cast to RouteResponse:
 CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4816) Add document routing to CloudSolrServer

2013-07-18 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4816:
--

Fix Version/s: (was: 4.4)
   4.5

It's def too late for 4.4 (we already branched and the first rc vote is 
ongoing), but high priority for 4.5.

 Add document routing to CloudSolrServer
 ---

 Key: SOLR-4816
 URL: https://issues.apache.org/jira/browse/SOLR-4816
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.3
Reporter: Joel Bernstein
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0, 4.5

 Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch


 This issue adds the following enhancements to CloudSolrServer's update logic:
 1) Document routing: Updates are routed directly to the correct shard leader 
 eliminating document routing at the server.
 2) Optional parallel update execution: Updates for each shard are executed in 
 a separate thread so parallel indexing can occur across the cluster.
 These enhancements should allow for near linear scalability on indexing 
 throughput.
 Usage:
 CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
 cloudClient.setParallelUpdates(true); 
 SolrInputDocument doc1 = new SolrInputDocument();
 doc1.addField(id, 0);
 doc1.addField(a_t, hello1);
 SolrInputDocument doc2 = new SolrInputDocument();
 doc2.addField(id, 2);
 doc2.addField(a_t, hello2);
 UpdateRequest request = new UpdateRequest();
 request.add(doc1);
 request.add(doc2);
 request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
 NamedList response = cloudClient.request(request); // Returns a backwards 
 compatible condensed response.
 //To get more detailed response down cast to RouteResponse:
 CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release 4.4

2013-07-18 Thread Mark Miller

Bah. You are right. We should respin for this.

I'll update it.

- Mark

On Jul 18, 2013, at 1:06 AM, Jack Krupansky j...@basetechnology.com wrote:

 -1
 
 In the Solr example solrconfig.xml:
 
   luceneMatchVersionLUCENE_43/luceneMatchVersion
 
 That should be:
 
   luceneMatchVersionLUCENE_44/luceneMatchVersion
 
 Otherwise the 4.4 changes to the EdgeNGramTokenizer/Filter are disabled in 
 the Solr example config.
 
 See:
 
 https://issues.apache.org/jira/browse/LUCENE-3907
 (Eliminated side=back).
 
 -- Jack Krupansky
 
 -Original Message- From: Steve Rowe
 Sent: Tuesday, July 16, 2013 2:32 AM
 To: dev@lucene.apache.org
 Subject: [VOTE] Release 4.4
 
 Please vote to release Lucene and Solr 4.4, built off revision 1503555 of 
 https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_4.
 
 RC0 artifacts are available at:
 
 http://people.apache.org/~sarowe/staging_area/lucene-solr-4.4.0-RC0-rev1503555
 
 The smoke tester passes for me.
 
 Here's my +1.
 
 Steve
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer


[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712323#comment-13712323
 ] 

Joel Bernstein commented on SOLR-4816:
--

Markus,

cloudClient.setParallelUpdates(true);

Will turn on parallel updates, this in theory should give you much better 
performance. Depending on the size of docs you could probably go with a pretty 
high batch size. With ten servers a batch size of 5000 would send roughly 500 
docs to each server.

 Add document routing to CloudSolrServer
 ---

 Key: SOLR-4816
 URL: https://issues.apache.org/jira/browse/SOLR-4816
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.3
Reporter: Joel Bernstein
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0, 4.5

 Attachments: SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
 SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch


 This issue adds the following enhancements to CloudSolrServer's update logic:
 1) Document routing: Updates are routed directly to the correct shard leader 
 eliminating document routing at the server.
 2) Optional parallel update execution: Updates for each shard are executed in 
 a separate thread so parallel indexing can occur across the cluster.
 These enhancements should allow for near linear scalability on indexing 
 throughput.
 Usage:
 CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
 cloudClient.setParallelUpdates(true); 
 SolrInputDocument doc1 = new SolrInputDocument();
 doc1.addField(id, 0);
 doc1.addField(a_t, hello1);
 SolrInputDocument doc2 = new SolrInputDocument();
 doc2.addField(id, 2);
 doc2.addField(a_t, hello2);
 UpdateRequest request = new UpdateRequest();
 request.add(doc1);
 request.add(doc2);
 request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
 NamedList response = cloudClient.request(request); // Returns a backwards 
 compatible condensed response.
 //To get more detailed response down cast to RouteResponse:
 CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Shai Erera

Actually, after chatting w/ Mike about it, he made a good point about
making SynMap expose API like lookup(word), because that doesn't work with
multi-word synonyms (e.g. wi fi - wifi). So I no longer think we
should change SynFilter. Since in my case it's 1:1 (so much I learned so
far), I should write my own TokenFilter.

So now the question is whether to do it at indexing time or search time.
Each has pros and cons. I'll need to learn more about the DB first, e.g.
how many words have only tens of synonyms and how many thousands. I suspect
there's no single solution here, so will need to experiment with both.

Jack, I didn't quite follow the 2048 common limit -- is it a Solr limit of
some sort? If so, can you please elaborate?

Shai


On Thu, Jul 18, 2013 at 4:12 PM, Jack Krupansky j...@basetechnology.comwrote:

   Maybe a custom search component would be in order, to “enrich” the
 incoming query. Again, preprocessing the query for synonym expansion before
 Solr parses it. It could call the external synonym API and cache synonyms
 as well.

 But, I’d still lean towards preprocessing in an application layer.
 Although, for hundreds or thousands of synonyms it would probably hit the
 2048 common limit for URLs in some containers, which would need to be
 raised.

 -- Jack Krupansky

  *From:* Shai Erera ser...@gmail.com
 *Sent:* Thursday, July 18, 2013 8:54 AM
 *To:* dev@lucene.apache.org
 *Subject:* Re: Programmatic Synonyms Filter (Lucene and/or Solr)

  The examples I've seen so far are single words. But I learned today
 something new .. the number of synonyms returned for a word may be in the
 range of hundreds, sometimes even thousands.
 So I'm not sure query-time synonyms may work at all .. what do you think?

 Shai


 On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky 
 j...@basetechnology.comwrote:

   Your best bet is to preprocess queries and expand synonyms in your own
 application layer. The Lucene/Solr synonym implementation, design, and
 architecture is fairly lightweight (although FST is a big improvement) and
 not architected for large and dynamic synonym sets.

 Do you need multi-word phrase synonyms as well, or is this strictly
 single-word synonyms?

 -- Jack Krupansky

  *From:* Shai Erera ser...@gmail.com
 *Sent:* Thursday, July 18, 2013 1:36 AM
 *To:* dev@lucene.apache.org
 *Subject:* Programmatic Synonyms Filter (Lucene and/or Solr)

  Hi

 I was asked to integrate with a system which provides synonyms for words
 through API. I checked the existing synonym filters in Lucene and Solr and
 they all seem to take a synonyms map up front.

 E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so
 it's not really programmatic in the sense that I can provide an impl which
 will pull the synonyms through the other system's API.

 Solr SynonymFilterFactory just loads the synonyms from a file into a
 SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I
 can extend that one either.

 The problem is that the synonyms DB I should integrate with is HUGE and
 will probably not fit in RAM (SynonymMap). Nor is it currently possible to
 pull all available synonyms from it in one go. The API I have is something
 like String[] getSynonyms(String word).

 So I have few questions:

 1) Did I miss a Filter which does take a programmatic syn-map which I can
 provide my own impl to?

 2) If not, Would it make sense to modify SynonymMap to offer
 getSynonyms(word) API (using BytesRef / CharsRef of course), with an
 FSTSynonymMap default impl so that users can provide their own impl, e.g.
 not requiring everything to be in RAM?

 2.1) Side-effect benefit, I think, is that we won't require everyone to
 deal with the FST API that way, though I'll admit I cannot think of may use
 cases for not using SynonymFilter as-is ...

 3) If the answer to (1) and (2) is NO, I guess my only option is to
 implement my own SynonymFilter, copying most of the code from Lucene's ...
 right?

 Shai

Re: 4.0 and 4.1 FieldCacheImpl.DocTermsImpl.exists(docid) possibly broken

2013-07-18 Thread Michael McCandless

Thanks Doron, that's definitely completely backwards!!

Good thing the API is gone.


Mike McCandless

http://blog.mikemccandless.com


On Thu, Jul 18, 2013 at 7:50 AM, Doron Cohen cdor...@gmail.com wrote:
 Hi, just an FYI - may be helpful for anyone obliged to use 4.0.0 or 4.1.0 -
 it seems that this method is actually doing the opposite of its intention.

 I did not find mentions of this in the lists or elsewhere.

 This is the code for o.a.l.search.FieldCacheImpl.DocTermsImpl.exists(int):
 public boolean exists(int docID) {
   return docToOffset.get(docID) == 0;
 }

 Its description says: Returns true if this doc has this field and is not
 deleted.
 But it returns true for docs not containing the field and false for those
 that do contain it.

 A simple workaround is to not to call this method before calling getTerm()
 but rather just rely on getTerm()  logic: ... returns the same BytesRef, or
 an empty (length=0) BytesRef if the doc did not have this field or was
 deleted.

 So usage code can be like this:
 DocTerms values =  FieldCache.DEFAULT.getTerms(reader, FIELD_NAME);
 BytesRef term = new BytesRef();
 for (int docid=0; docidvalues.size(); docid++) {
 term = values.getTerm(docid, term);
 if (term.length0) {
 doSomethingWith(term.utf8ToString());
 }
 }
 FieldCache.DEFAULT.purge(reader);

 I am not sure about the overhead of this comparing to first checking
 exists(), but it at least work correctly.

 The code for exists() was as above until R1442497
 (http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java?revision=1442497view=markup)
 and then in R1443717
 (http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java?r1=1442497r2=1443717diff_format=h)
 the API was change as part of LUCENE-4547 (DocValues improvements) which was
 included in 4.2.

 Simple code to demonstrate this (here with 4.1 but same results with 4.0):

 RAMDirectory d = new RAMDirectory();
 IndexWriter w = new IndexWriter(d, new IndexWriterConfig(Version.LUCENE_41,
 new SimpleAnalyzer(Version.LUCENE_41)));
 w.addDocument(new Document()); // Empty doc (0, 0)
 Document doc = new Document(); // Real doc (1, 1)
 doc.add(new StringField(f1, v1, Store.NO));
 w.addDocument(doc);
 w.addDocument(new Document()); // Empty doc (2, 2)
 w.addDocument(new Document()); // Empty doc (3, 3)
 w.commit(); // Commit - so we'll have two atomic readers
 doc = new Document(); // RealDoc (0, 4)
 doc.add(new StringField(f1, v2, Store.NO));
 w.addDocument(doc);
 w.addDocument(new Document()); // Empty doc (1, 5)
 w.close();

 IndexReader r = DirectoryReader.open(d);
 BytesRef br = new BytesRef();
 for (AtomicReaderContext leaf : r.leaves()) {
 System.out.println(--- new atomic reader);
 AtomicReader reader = leaf.reader();
 DocTerms a = FieldCache.DEFAULT.getTerms(reader, f1);
 for (int i = 0; i  reader.maxDoc(); ++i) {
 int n = leaf.docBase + i;
 System.out.println(n+ exists: +a.exists(i));
 br = a.getTerm(i, br);
 if (br.length  0) {
 System.out.println(n +   +br.utf8ToString());
 }
 }
 }

 The result printing:

   --- new atomic reader
   0 exists: true
   1 exists: false
   1  v1
   2 exists: true
   3 exists: true
   --- new atomic reader
  4 exists: false
  4  v2
  5 exists: true

 Indeed, exists() results are wrong.

 So again, just an FYI, as this API no longer exists...

 Regards,
 Doron

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Contentions observed in lucene execution

2013-07-18 Thread Michael McCandless

Lucene 2.4.x is quite ancient by now ...

FSDirectory.FSIndexInput is single-threaded in seeking/reading bytes,
which I think explains your 1 and 4.  Try using MMapDirectory, if you
are using a 64 bit JVM or if your index is tiny.  Newer Lucene
versions also have NIOFSDirectory, which is thread-friendly on Unix
(but not on Windows due to a JVM bug).

For 2 and 3, creating a FieldCache entry is also single threaded, but
this is a one-time event on the first search to the IndexReader
requiring that entry.  Lucene 4.x adds doc values which are much more
efficient to init at search time.

But, what changed in your app?  Perhaps there's less RAM available to
the OS for caching IO pages (this could explain 1 and 4)?

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jul 18, 2013 at 6:46 AM, RameshIyerV rameshiy...@hotmail.com wrote:
 Hi All,

 I need some help in analyzing some contentions we observe in the Lucene
 execution.

 We are supporting the Sterling 9.0 fulfillment application and it uses
 Lucene 2.4 for catalog search functionality.

 ---The Issue
 This system is Live in production since Nov 2012 and only recently (Mid June
 2013) our application is forming stuck threads during lucene invocations,
 this causes our application to crash.

 This occurs 2 - 3 times a week, on other days we see spikes of very slow
 performance on the exact places that causes stuck threads.

 ---The research---
 We have validated that the data or the usage has not grown between Jan 2012
  now.

 We took snapshot of the code execution (through visual VM) and for slow
 running treads we validated that too much time is spent at certain spots
 (these very same spots appear in the stack trace of the stuck threads).

 ---Help needed---
 If you can guide me on what kind of contentions (heap, IO, Data, CPU, JVM
 params) can cause such a behavior it will really help.


 ---Lucene Invocation contentions observed---
 (We find stuck threads  slowness at the following spots, ordered in the
 order of severity [high to low])
 1.  java.io.RandomAccessFile.readBytes(Native Method)
 java.io.RandomAccessFile.read(RandomAccessFile.java:338)

 org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:596)

 org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136)

 2.  org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
 org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:167)

 org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:373)
 
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)

 org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:351)

 3.  org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80)
 org.apache.lucene.index.TermBuffer.read(TermBuffer.java:65)
 org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:127)

 org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:389)
 
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)

 org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:351)

 4. java.io.RandomAccessFile.seek(Native Method)

 org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:591)

 org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136)



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Contentions-observed-in-lucene-execution-tp4078796.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread ASF subversion and git services (JIRA)

Container (e.g., Tomcat) limit. Configurable. I don’t recall the specifics.

-- Jack Krupansky

From: Shai Erera 
Sent: Thursday, July 18, 2013 9:46 AM
To: dev@lucene.apache.org 
Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr)

Actually, after chatting w/ Mike about it, he made a good point about making 
SynMap expose API like lookup(word), because that doesn't work with multi-word 
synonyms (e.g. wi fi - wifi). So I no longer think we should change 
SynFilter. Since in my case it's 1:1 (so much I learned so far), I should write 
my own TokenFilter.

So now the question is whether to do it at indexing time or search time. Each 
has pros and cons. I'll need to learn more about the DB first, e.g. how many 
words have only tens of synonyms and how many thousands. I suspect there's no 
single solution here, so will need to experiment with both.

Jack, I didn't quite follow the 2048 common limit -- is it a Solr limit of some 
sort? If so, can you please elaborate?

Shai

On Thu, Jul 18, 2013 at 4:12 PM, Jack Krupansky j...@basetechnology.com wrote:

  Maybe a custom search component would be in order, to “enrich” the incoming 
query. Again, preprocessing the query for synonym expansion before Solr parses 
it. It could call the external synonym API and cache synonyms as well.

  But, I’d still lean towards preprocessing in an application layer. Although, 
for hundreds or thousands of synonyms it would probably hit the 2048 common 
limit for URLs in some containers, which would need to be raised.

  -- Jack Krupansky

  From: Shai Erera 
  Sent: Thursday, July 18, 2013 8:54 AM
  To: dev@lucene.apache.org 
  Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr)

  The examples I've seen so far are single words. But I learned today something 
new .. the number of synonyms returned for a word may be in the range of 
hundreds, sometimes even thousands.

  So I'm not sure query-time synonyms may work at all .. what do you think?

  Shai

  On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky j...@basetechnology.com 
wrote:

Your best bet is to preprocess queries and expand synonyms in your own 
application layer. The Lucene/Solr synonym implementation, design, and 
architecture is fairly lightweight (although FST is a big improvement) and not 
architected for large and dynamic synonym sets.

Do you need multi-word phrase synonyms as well, or is this strictly 
single-word synonyms?

-- Jack Krupansky

From: Shai Erera 
Sent: Thursday, July 18, 2013 1:36 AM
To: dev@lucene.apache.org 
Subject: Programmatic Synonyms Filter (Lucene and/or Solr)

Hi

I was asked to integrate with a system which provides synonyms for words 
through API. I checked the existing synonym filters in Lucene and Solr and they 
all seem to take a synonyms map up front. 

E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so 
it's not really programmatic in the sense that I can provide an impl which will 
pull the synonyms through the other system's API.

Solr SynonymFilterFactory just loads the synonyms from a file into a 
SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can 
extend that one either.

The problem is that the synonyms DB I should integrate with is HUGE and 
will probably not fit in RAM (SynonymMap). Nor is it currently possible to pull 
all available synonyms from it in one go. The API I have is something like 
String[] getSynonyms(String word).

So I have few questions:

1) Did I miss a Filter which does take a programmatic syn-map which I can 
provide my own impl to?

2) If not, Would it make sense to modify SynonymMap to offer 
getSynonyms(word) API (using BytesRef / CharsRef of course), with an 
FSTSynonymMap default impl so that users can provide their own impl, e.g. not 
requiring everything to be in RAM?

2.1) Side-effect benefit, I think, is that we won't require everyone to 
deal with the FST API that way, though I'll admit I cannot think of may use 
cases for not using SynonymFilter as-is ...

3) If the answer to (1) and (2) is NO, I guess my only option is to 
implement my own SynonymFilter, copying most of the code from Lucene's ... 
right?

Shai

[jira] [Commented] (SOLR-4860) MoreLikeThisHandler doesn't work with numeric or date fields in 4.x

2013-07-18 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712365#comment-13712365
 ] 

Yonik Seeley commented on SOLR-4860:


It's unfortunately a lucene limitation - numeric type fields no longer work 
since they are encoded so differently (using a different attribute rather than 
a text attribute).  I think we should probably just ignore numeric-type fields.

 MoreLikeThisHandler doesn't work with numeric or date fields in 4.x
 ---

 Key: SOLR-4860
 URL: https://issues.apache.org/jira/browse/SOLR-4860
 Project: Solr
  Issue Type: Bug
  Components: MoreLikeThis
Affects Versions: 4.2
Reporter: Thomas Seidl

 After upgrading to Solr 4.2 (from 3.x), I realized that my MLT queries no 
 longer work. It happens if I pass an integer ({{solr.TrieIntField}}), float 
 ({{solr.TrieFloatField}}) or date ({{solr.DateField}}) field as part of the 
 {{mlt.fl}} parameter. The field's {{multiValued}} setting doesn't seem to 
 matter.
 This is the error I get:
 {noformat}
 NumericTokenStream does not support CharTermAttribute.
 java.lang.IllegalArgumentException: NumericTokenStream does not support 
 CharTermAttribute.
   at 
 org.apache.lucene.analysis.NumericTokenStream$NumericAttributeFactory.createAttributeInstance(NumericTokenStream.java:136)
   at 
 org.apache.lucene.util.AttributeSource.addAttribute(AttributeSource.java:271)
   at 
 org.apache.lucene.queries.mlt.MoreLikeThis.addTermFrequencies(MoreLikeThis.java:781)
   at 
 org.apache.lucene.queries.mlt.MoreLikeThis.retrieveTerms(MoreLikeThis.java:724)
   at 
 org.apache.lucene.queries.mlt.MoreLikeThis.like(MoreLikeThis.java:578)
   at 
 org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:348)
   at 
 org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:167)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
   at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
   at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
   at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
   at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
   at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
   at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
   at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
   at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
   at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
   at org.eclipse.jetty.server.Server.handle(Server.java:365)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
   at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
   at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
   at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
   at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
   at java.lang.Thread.run(Thread.java:679)
 {noformat}
 The configuration I use can be found here:

[jira] [Updated] (SOLR-4777) Handle SliceState in the Admin UI

2013-07-18 Thread Shalin Shekhar Mangar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-4777:


Affects Version/s: 4.3
Fix Version/s: 4.5

 Handle SliceState in the Admin UI
 -

 Key: SOLR-4777
 URL: https://issues.apache.org/jira/browse/SOLR-4777
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud, web gui
Affects Versions: 4.3
Reporter: Anshum Gupta
 Fix For: 4.5


 The Solr admin UI as of now does take Slice state into account.
 We need to have that differentiated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4777) Handle SliceState in the Admin UI

2013-07-18 Thread Shalin Shekhar Mangar (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shalin Shekhar Mangar updated SOLR-4777:

Description:
The Solr admin UI as of now does take Slice state into account.
We need to have that differentiated.

There are three states:
# The default is ACTIVE
# CONSTRUCTION (used during shard splitting for new sub shards), and
# INACTIVE - the parent shard is set to this state after split is complete

A slice/shard which is INACTIVE will not accept traffic (i.e. it will
re-route traffic to sub shards) even though the nodes inside this shard show up
as green.

We should show the INACTIVE shards in a different color to highlight this
behavior.

was:
The Solr admin UI as of now does take Slice state into account.
We need to have that differentiated.

Handle SliceState in the Admin UI
-

Key: SOLR-4777
URL: https://issues.apache.org/jira/browse/SOLR-4777
Project: Solr
Issue Type: Improvement
Components: SolrCloud, web gui
Affects Versions: 4.3
Reporter: Anshum Gupta
Fix For: 4.5

The Solr admin UI as of now does take Slice state into account.
We need to have that differentiated.
There are three states:
# The default is ACTIVE
# CONSTRUCTION (used during shard splitting for new sub shards), and
# INACTIVE - the parent shard is set to this state after split is complete
A slice/shard which is INACTIVE will not accept traffic (i.e. it will
re-route traffic to sub shards) even though the nodes inside this shard show
up as green.
We should show the INACTIVE shards in a different color to highlight this
behavior.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters


[ 
https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712377#comment-13712377
 ] 

ASF subversion and git services commented on LUCENE-5030:
-

Commit 1504490 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1504490 ]

LUCENE-5030: FuzzySuggester can optionally measure edits in Unicode code points 
instead of UTF8 bytes

 FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work 
 correctly for 1-byte (like English) and multi-byte (non-Latin) letters
 

 Key: LUCENE-5030
 URL: https://issues.apache.org/jira/browse/LUCENE-5030
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Artem Lukanin
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: benchmark-INFO_SEP.txt, benchmark-old.txt, 
 benchmark-wo_convertion.txt, LUCENE-5030.patch, LUCENE-5030.patch, 
 LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, 
 nonlatin_fuzzySuggester1.patch, nonlatin_fuzzySuggester2.patch, 
 nonlatin_fuzzySuggester3.patch, nonlatin_fuzzySuggester4.patch, 
 nonlatin_fuzzySuggester_combo1.patch, nonlatin_fuzzySuggester_combo2.patch, 
 nonlatin_fuzzySuggester_combo.patch, nonlatin_fuzzySuggester.patch, 
 nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch, 
 run-suggest-benchmark.patch


 There is a limitation in the current FuzzySuggester implementation: it 
 computes edits in UTF-8 space instead of Unicode character (code point) 
 space. 
 This should be fixable: we'd need to fix TokenStreamToAutomaton to work in 
 Unicode character space, then fix FuzzySuggester to do the same steps that 
 FuzzyQuery does: do the LevN expansion in Unicode character space, then 
 convert that automaton to UTF-8, then intersect with the suggest FST.
 See the discussion here: 
 http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-18 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712380#comment-13712380
 ] 

ASF subversion and git services commented on LUCENE-5030:
-

Commit 1504492 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1504492 ]

LUCENE-5030: FuzzySuggester can optionally measure edits in Unicode code points 
instead of UTF8 bytes

 FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work 
 correctly for 1-byte (like English) and multi-byte (non-Latin) letters
 

 Key: LUCENE-5030
 URL: https://issues.apache.org/jira/browse/LUCENE-5030
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Artem Lukanin
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: benchmark-INFO_SEP.txt, benchmark-old.txt, 
 benchmark-wo_convertion.txt, LUCENE-5030.patch, LUCENE-5030.patch, 
 LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, 
 nonlatin_fuzzySuggester1.patch, nonlatin_fuzzySuggester2.patch, 
 nonlatin_fuzzySuggester3.patch, nonlatin_fuzzySuggester4.patch, 
 nonlatin_fuzzySuggester_combo1.patch, nonlatin_fuzzySuggester_combo2.patch, 
 nonlatin_fuzzySuggester_combo.patch, nonlatin_fuzzySuggester.patch, 
 nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch, 
 run-suggest-benchmark.patch


 There is a limitation in the current FuzzySuggester implementation: it 
 computes edits in UTF-8 space instead of Unicode character (code point) 
 space. 
 This should be fixable: we'd need to fix TokenStreamToAutomaton to work in 
 Unicode character space, then fix FuzzySuggester to do the same steps that 
 FuzzyQuery does: do the LevN expansion in Unicode character space, then 
 convert that automaton to UTF-8, then intersect with the suggest FST.
 See the discussion here: 
 http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-18 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-5030.


   Resolution: Fixed
Fix Version/s: (was: 4.4)
   4.5

OK I committed the last patch with a few small fixes:

  * Added @lucene.experimental to FuzzySuggester

  * Removed the added ctor (so we have just two ctors: the easy one,
which uses all defaults, and the expert one, where you specify
everything)

  * Removed System.out.printlns from the test

Thanks Artem!


 FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work 
 correctly for 1-byte (like English) and multi-byte (non-Latin) letters
 

 Key: LUCENE-5030
 URL: https://issues.apache.org/jira/browse/LUCENE-5030
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Artem Lukanin
Assignee: Michael McCandless
 Fix For: 5.0, 4.5

 Attachments: benchmark-INFO_SEP.txt, benchmark-old.txt, 
 benchmark-wo_convertion.txt, LUCENE-5030.patch, LUCENE-5030.patch, 
 LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, 
 nonlatin_fuzzySuggester1.patch, nonlatin_fuzzySuggester2.patch, 
 nonlatin_fuzzySuggester3.patch, nonlatin_fuzzySuggester4.patch, 
 nonlatin_fuzzySuggester_combo1.patch, nonlatin_fuzzySuggester_combo2.patch, 
 nonlatin_fuzzySuggester_combo.patch, nonlatin_fuzzySuggester.patch, 
 nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch, 
 run-suggest-benchmark.patch


 There is a limitation in the current FuzzySuggester implementation: it 
 computes edits in UTF-8 space instead of Unicode character (code point) 
 space. 
 This should be fixable: we'd need to fix TokenStreamToAutomaton to work in 
 Unicode character space, then fix FuzzySuggester to do the same steps that 
 FuzzyQuery does: do the LevN expansion in Unicode character space, then 
 convert that automaton to UTF-8, then intersect with the suggest FST.
 See the discussion here: 
 http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica

[
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712400#comment-13712400
]

Markus Jelsma commented on SOLR-4260:
-

Alright, nothing looks like zookeeper expirations i grepped expirations in the
error log but there's nothing there. This indexing session did not produce so
many inconsistencies as the previous one; there is only 1 shard of which one
replica has 2 more documents. It won't fix itself.

During indexing there were, as usual, error such as autocommit causing a
searcher too many and time outs talking to other nodes.

Only 2 nodes report a Stopping Recovery For of which one node actually has a
replica of the inconsistent core. The other shard is seems fine, both replica's
have the same numDocs.

Inconsistent numDocs between leader and replica
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-5091) Modify SpanNotQuery to act as SpanNotNearQuery too


 [ 
https://issues.apache.org/jira/browse/LUCENE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned LUCENE-5091:


Assignee: David Smiley

 Modify SpanNotQuery to act as SpanNotNearQuery too
 --

 Key: LUCENE-5091
 URL: https://issues.apache.org/jira/browse/LUCENE-5091
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: 4.3.1
Reporter: Tim Allison
Assignee: David Smiley
Priority: Minor
 Fix For: 4.5

 Attachments: LUCENE-5091.patch.txt


 With very small modifications, SpanNotQuery can act as a SpanNotNearQuery.
 To find a but not if b appears 3 tokens before or 4 tokens after a:
 new SpanNotQuery(a, b, 3, 4)
 Original constructor still exists and calls SpanNotQuery(a, b, 0, 0).
 Patch with tests on way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters


[ 
https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712408#comment-13712408
 ] 

Uwe Schindler commented on LUCENE-5030:
---

JUH! :-) Thanks for heavy committing - it took a long time, but now it is 
good! Many thanks, Uwe

 FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work 
 correctly for 1-byte (like English) and multi-byte (non-Latin) letters
 

 Key: LUCENE-5030
 URL: https://issues.apache.org/jira/browse/LUCENE-5030
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Artem Lukanin
Assignee: Michael McCandless
 Fix For: 5.0, 4.5

 Attachments: benchmark-INFO_SEP.txt, benchmark-old.txt, 
 benchmark-wo_convertion.txt, LUCENE-5030.patch, LUCENE-5030.patch, 
 LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, 
 nonlatin_fuzzySuggester1.patch, nonlatin_fuzzySuggester2.patch, 
 nonlatin_fuzzySuggester3.patch, nonlatin_fuzzySuggester4.patch, 
 nonlatin_fuzzySuggester_combo1.patch, nonlatin_fuzzySuggester_combo2.patch, 
 nonlatin_fuzzySuggester_combo.patch, nonlatin_fuzzySuggester.patch, 
 nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch, 
 run-suggest-benchmark.patch


 There is a limitation in the current FuzzySuggester implementation: it 
 computes edits in UTF-8 space instead of Unicode character (code point) 
 space. 
 This should be fixable: we'd need to fix TokenStreamToAutomaton to work in 
 Unicode character space, then fix FuzzySuggester to do the same steps that 
 FuzzyQuery does: do the LevN expansion in Unicode character space, then 
 convert that automaton to UTF-8, then intersect with the suggest FST.
 See the discussion here: 
 http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Linux (64bit/ibm-j9-jdk7) - Build # 6622 - Failure!

2013-07-18 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/6622/
Java: 64bit/ibm-j9-jdk7 
-Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;}

1 tests failed.
REGRESSION:  org.apache.solr.core.TestJmxIntegration.testJmxRegistration

Error Message:
No SolrDynamicMBeans found

Stack Trace:
java.lang.AssertionError: No SolrDynamicMBeans found
at 
__randomizedtesting.SeedInfo.seed([ABCC33B5C702AC02:251D578FAA43F467]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.solr.core.TestJmxIntegration.testJmxRegistration(TestJmxIntegration.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:88)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:613)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:780)




Build Log:
[...truncated 9411 lines...]
   [junit4] Suite:

Re: list-unsubscr...@apache.org

2013-07-18 Thread Chris Hostetter


: Subject: list-unsubscr...@apache.org

If anyone wihes to unsubscribe, you need to *send* an email to the 
unsubscribe address, not put it in the subject of a reply.

the specifics of how to unsubscribe are listed in the footer of every 
email to the list(s) you are on...

:  -
:  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
:  For additional commands, e-mail: dev-h...@lucene.apache.org


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable


 [ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4542:
-

Assignee: Adrien Grand  (was: Chris Male)

 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr
Assignee: Adrien Grand
 Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, 
 LUCENE-4542-with-solr.patch


 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance (f.ex. it costs 
 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for 
 recursion_cap=1). It would be nice to be able to tune this number as needed.
 AFAIK this number (2) was chosen arbitrary.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4734) FastVectorHighlighter Overlapping Proximity Queries Do Not Highlight

2013-07-18 Thread Ryan Lauck (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712427#comment-13712427
 ] 

Ryan Lauck commented on LUCENE-4734:


Thanks Adrien!

I agree about LUCENE-2878. I came to the same conclusion before finding that 
someone had already done most of the work that the ideal scenario is to 
(optionally) pull postings or term vectors in addition to payloads while 
scoring and expose them for highlighting. I'm looking forward to that patch too!

An idea I began working on but haven't polished enough to submit a patch for:

Users of the API could access raw highlight metadata (offsets and positions) 
and could additionally process to merge/filter/ignore overlapping highlights - 
one flaw I've had to work around in existing highlighters is that when 
highlights overlap they either merge them or toss all but the first 
encountered. We perform the highlighting manually in our system and hope to one 
day allow end users to toggle which terms are highlighted without having to 
make round-trips to the server to modify the search criteria and rerun the 
highlighter. With raw offset data this is trivial and merging/discarding 
overlaps can be handled in client-side code. There are additional advantages 
too such as being able to highlight find-in-page or search-within-search 
results and only having to transfer new offset metadata rather than the entire 
text over the wire (we have some very big 100MB+ documents).

 FastVectorHighlighter Overlapping Proximity Queries Do Not Highlight
 

 Key: LUCENE-4734
 URL: https://issues.apache.org/jira/browse/LUCENE-4734
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.0, 4.1, 5.0
Reporter: Ryan Lauck
  Labels: fastvectorhighlighter, highlighter
 Fix For: 4.4

 Attachments: lucene-4734.patch, LUCENE-4734.patch


 If a proximity phrase query overlaps with any other query term it will not be 
 highlighted.
 Example Text:  A B C D E F G
 Example Queries: 
 B E~10 D
 (D will be highlighted instead of B C D E)
 B E~10 C F~10
 (nothing will be highlighted)
 This can be traced to the FieldPhraseList constructor's inner while loop. 
 From the first example query, the first TermInfo popped off the stack will be 
 B. The second TermInfo will be D which will not be found in the submap 
 for B E~10 and will trigger a failed match.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set

2013-07-18 Thread Erick Erickson (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712432#comment-13712432
]

Erick Erickson commented on SOLR-4478:
--

OK, reconstructing an chat exchange:

Sharing the underlying solrconfig objects looks like it's more difficult than I
thought, with some interesting corner cases that would be difficult, i.e ${}
substitutions, resource loader being shared, etc. Also, the individual core
properties are embedded in the Config object, so keeping these separate is
another source of getting code wrong.

Not to mention that the code changes would be more extensive than anyone had
hoped.

At lest the use-case of opening a core and actively using it for a while then
moving on is handled by the lazy/transient core opportunities.

There is historical evidence that a significant amount of CPU resources are
consumed by opening/closing cores 100s of times a second, so that scenario is
still out there.

The net-net is that it's probably not worth the effort right now to really
share the underlying solrConfig object across cores, too many ways to go wrong.
The refactoring that's been done should make this easier if we decide to do it
in the future.

Allow cores to specify a named config set
-

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-460) hashCode improvements


[ 
https://issues.apache.org/jira/browse/LUCENE-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712434#comment-13712434
 ] 

David Smiley commented on LUCENE-460:
-

I am not an expert on hashCode generation, yet as any java developer I have to 
generate hash codes.  I typically leave this to my IDE, IntelliJ.  As I find 
the need to update a hashCode, *do you think it's bad form for me to outright 
replace an existing hashCode implementation you wrote that looks complicated to 
me with what IntelliJ generates?*:  Here's a specific example:

SpanNotQuery formerly:
{code:java}
int h = include.hashCode();
h = (h1) | (h  31);  // rotate left
h ^= exclude.hashCode();
h = (h1) | (h  31);  // rotate left
h ^= Float.floatToRawIntBits(getBoost());
return h;
{code}

IntelliJ will generate a hashCode for this + a new pre  post pair of integer 
fields I'm adding via LUCENE-5091:
{code:java}
int result = super.hashCode();
result = 31 * result + include.hashCode();
result = 31 * result + exclude.hashCode();
result = 31 * result + pre;
result = 31 * result + post;
return result;
{code}

Now that's a hashCode implementation I can understand, and I don't question 
it's validity because IntelliJ always generates them in a consistent fashion 
that I am used to seeing.  Your hashCode might be better, but I simply don't 
understand and thus can't maintain it.  Do you want me to consult you (or an 
applicable author of a confusing hashCode in general) every time?  Granted this 
doesn't happen often.


 hashCode improvements
 -

 Key: LUCENE-460
 URL: https://issues.apache.org/jira/browse/LUCENE-460
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Yonik Seeley
Assignee: Yonik Seeley
Priority: Minor

 It would be nice for all Query classes to implement hashCode and equals to 
 enable them to be used as keys when caching.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set

2013-07-18 Thread Erick Erickson (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712436#comment-13712436
]

Erick Erickson commented on SOLR-4478:
--

I just had a bright idea, so I'll put it out there so someone can shoot it
down. It seems like sharing the underlying solrConfig object is fraught with
problems, but could we get an easy win by just sharing the parsed DOM object in
each config set (really, same for schema object?) I don't have any measurements
for what percentage of loading the schema object is spent in raw XML parsing,
so I can't really say how much of a win this would be. But if it's easy/safe it
might be worth considering.

Allow cores to specify a named config set
-

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5091) Modify SpanNotQuery to act as SpanNotNearQuery too


[ 
https://issues.apache.org/jira/browse/LUCENE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712440#comment-13712440
 ] 

David Smiley commented on LUCENE-5091:
--

Looks good, Tim, except for one thing:  The way you incorporated pre  post 
into the hashCode is bad, as another unequal query with a pre and post values 
with flipped values would have the same hashCode.  I'm consulting the other 
dev's on 
https://issues.apache.org/jira/browse/LUCENE-460?focusedCommentId=13712434page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13712434
 on a suitable replacement, which will block me committing this for the moment.

Also, I updated the package.html summary with a new description matching the 
class javadocs:
{code:html}
liA {@link org.apache.lucene.search.spans.SpanNotQuery SpanNotQuery} removes 
spans
matching one {@link org.apache.lucene.search.spans.SpanQuery SpanQuery} which 
overlap (or comes
near) another.  This can be used, e.g., to implement within-paragraph
search./li
{code}

 Modify SpanNotQuery to act as SpanNotNearQuery too
 --

 Key: LUCENE-5091
 URL: https://issues.apache.org/jira/browse/LUCENE-5091
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: 4.3.1
Reporter: Tim Allison
Assignee: David Smiley
Priority: Minor
 Fix For: 4.5

 Attachments: LUCENE-5091.patch.txt


 With very small modifications, SpanNotQuery can act as a SpanNotNearQuery.
 To find a but not if b appears 3 tokens before or 4 tokens after a:
 new SpanNotQuery(a, b, 3, 4)
 Original constructor still exists and calls SpanNotQuery(a, b, 0, 0).
 Patch with tests on way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion

Hoss Man created SOLR-5048:
--

 Summary: fail the build if the example solrconfig.xml files don't 
have an up to date luceneMatchVersion
 Key: SOLR-5048
 URL: https://issues.apache.org/jira/browse/SOLR-5048
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man


4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... 
the build should fail in a situation like this.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion


 [ 
https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5048:
---

Attachment: SOLR-5048.patch

all the necessary info is already in the build files ... we just need to tweak 
the format and check the files.

 fail the build if the example solrconfig.xml files don't have an up to date 
 luceneMatchVersion
 --

 Key: SOLR-5048
 URL: https://issues.apache.org/jira/browse/SOLR-5048
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-5048.patch


 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... 
 the build should fail in a situation like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion


[ 
https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712459#comment-13712459
 ] 

Uwe Schindler commented on SOLR-5048:
-

I would change the example solr-config.xml file to use the {{x.y}} version 
format instead of {{LUCENE_xy}}. This would make the sanity check much simplier 
(you just have to check the configs and check for the 
{{$tests.luceneMatchVersion}} string).

The XML parser of Solr supports the plain text format since the beginning of 
the {{Version}} class.

 fail the build if the example solrconfig.xml files don't have an up to date 
 luceneMatchVersion
 --

 Key: SOLR-5048
 URL: https://issues.apache.org/jira/browse/SOLR-5048
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-5048.patch


 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... 
 the build should fail in a situation like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5049) switch to using luceneMatchVersion X.Y in example configs instead of LUCENE_XY

Hoss Man created SOLR-5049:
--

 Summary: switch to using luceneMatchVersion X.Y in example 
configs instead of LUCENE_XY
 Key: SOLR-5049
 URL: https://issues.apache.org/jira/browse/SOLR-5049
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.4


Uwe just pointed out to me on IRC that you can specify 
{{luceneMatchVersion/}} using X.Y instead of the more internal java variable 
esque LUCENE_XY.

i have no idea why we haven't been doing this in the past ... it makes so much 
more sense for end users, we should absolutely do this moving forward.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion


[ 
https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712464#comment-13712464
 ] 

Hoss Man commented on SOLR-5048:


opened a blocker issue SOLR-5049 to switch to this better format in the example 
congfigs -- it's a good idea in general in drastically simplifies the check we 
have to do for this issue.

 fail the build if the example solrconfig.xml files don't have an up to date 
 luceneMatchVersion
 --

 Key: SOLR-5048
 URL: https://issues.apache.org/jira/browse/SOLR-5048
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-5048.patch


 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... 
 the build should fail in a situation like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion


[ 
https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712470#comment-13712470
 ] 

Uwe Schindler commented on SOLR-5048:
-

To conclude: The patch is fine, just remove the part inside common-build.xml 
and use the test property inside the solr validate check. You have to fix the 
solrconfig files in any case to use {{x.y}} format.

 fail the build if the example solrconfig.xml files don't have an up to date 
 luceneMatchVersion
 --

 Key: SOLR-5048
 URL: https://issues.apache.org/jira/browse/SOLR-5048
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-5048.patch


 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... 
 the build should fail in a situation like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Walter Underwood

There are two serious issues with query-time synonyms, speed and correctness.

1. Expanding a term to 1000 synonyms at query time means 1000 term lookups. 
This will not be fast. Expanding the term at index time means 1000 posting list 
entries, but only one term lookup at query time.

2. Query time expansion will give higher scores to the more rare synonyms. This 
is almost never what you want. If I make TV and television synonyms, I want 
them both to score the same. But if TV is 10X more common than television, then 
documents with the rare term (television) will score better.

wunder

On Jul 18, 2013, at 5:54 AM, Shai Erera wrote:

 The examples I've seen so far are single words. But I learned today something 
 new .. the number of synonyms returned for a word may be in the range of 
 hundreds, sometimes even thousands.
 So I'm not sure query-time synonyms may work at all .. what do you think?
 
 Shai
 
 
 On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky j...@basetechnology.com 
 wrote:
 Your best bet is to preprocess queries and expand synonyms in your own 
 application layer. The Lucene/Solr synonym implementation, design, and 
 architecture is fairly lightweight (although FST is a big improvement) and 
 not architected for large and dynamic synonym sets.
  
 Do you need multi-word phrase synonyms as well, or is this strictly 
 single-word synonyms?
 
 -- Jack Krupansky
  
 From: Shai Erera
 Sent: Thursday, July 18, 2013 1:36 AM
 To: dev@lucene.apache.org
 Subject: Programmatic Synonyms Filter (Lucene and/or Solr)
  
 Hi
 
 I was asked to integrate with a system which provides synonyms for words 
 through API. I checked the existing synonym filters in Lucene and Solr and 
 they all seem to take a synonyms map up front. 
 
 E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's 
 not really programmatic in the sense that I can provide an impl which will 
 pull the synonyms through the other system's API.
 
 Solr SynonymFilterFactory just loads the synonyms from a file into a 
 SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I 
 can extend that one either.
 
 The problem is that the synonyms DB I should integrate with is HUGE and will 
 probably not fit in RAM (SynonymMap). Nor is it currently possible to pull 
 all available synonyms from it in one go. The API I have is something like 
 String[] getSynonyms(String word).
 
 So I have few questions:
 
 1) Did I miss a Filter which does take a programmatic syn-map which I can 
 provide my own impl to?
 
 2) If not, Would it make sense to modify SynonymMap to offer 
 getSynonyms(word) API (using BytesRef / CharsRef of course), with an 
 FSTSynonymMap default impl so that users can provide their own impl, e.g. not 
 requiring everything to be in RAM?
 
 2.1) Side-effect benefit, I think, is that we won't require everyone to deal 
 with the FST API that way, though I'll admit I cannot think of may use cases 
 for not using SynonymFilter as-is ...
  
 3) If the answer to (1) and (2) is NO, I guess my only option is to implement 
 my own SynonymFilter, copying most of the code from Lucene's ... right?
 
 Shai
 

--
Walter Underwood
wun...@wunderwood.org

[jira] [Commented] (SOLR-5049) switch to using luceneMatchVersion X.Y in example configs instead of LUCENE_XY

2013-07-18 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712479#comment-13712479
 ] 

Uwe Schindler commented on SOLR-5049:
-

I am sorry for not making this public back in 1.4 (or 3.0) times when I 
committed the lenient parser using the regex for the first time! :-)

 switch to using luceneMatchVersion X.Y in example configs instead of 
 LUCENE_XY
 

 Key: SOLR-5049
 URL: https://issues.apache.org/jira/browse/SOLR-5049
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.4


 Uwe just pointed out to me on IRC that you can specify 
 {{luceneMatchVersion/}} using X.Y instead of the more internal java 
 variable esque LUCENE_XY.
 i have no idea why we haven't been doing this in the past ... it makes so 
 much more sense for end users, we should absolutely do this moving forward.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-18 Thread Artem Lukanin (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712488#comment-13712488
 ] 

Artem Lukanin commented on LUCENE-5030:
---

Great! Thanks for reviewing.

 FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work 
 correctly for 1-byte (like English) and multi-byte (non-Latin) letters
 

 Key: LUCENE-5030
 URL: https://issues.apache.org/jira/browse/LUCENE-5030
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Artem Lukanin
Assignee: Michael McCandless
 Fix For: 5.0, 4.5

 Attachments: benchmark-INFO_SEP.txt, benchmark-old.txt, 
 benchmark-wo_convertion.txt, LUCENE-5030.patch, LUCENE-5030.patch, 
 LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, LUCENE-5030.patch, 
 nonlatin_fuzzySuggester1.patch, nonlatin_fuzzySuggester2.patch, 
 nonlatin_fuzzySuggester3.patch, nonlatin_fuzzySuggester4.patch, 
 nonlatin_fuzzySuggester_combo1.patch, nonlatin_fuzzySuggester_combo2.patch, 
 nonlatin_fuzzySuggester_combo.patch, nonlatin_fuzzySuggester.patch, 
 nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch, 
 run-suggest-benchmark.patch


 There is a limitation in the current FuzzySuggester implementation: it 
 computes edits in UTF-8 space instead of Unicode character (code point) 
 space. 
 This should be fixable: we'd need to fix TokenStreamToAutomaton to work in 
 Unicode character space, then fix FuzzySuggester to do the same steps that 
 FuzzyQuery does: do the LevN expansion in Unicode character space, then 
 convert that automaton to UTF-8, then intersect with the suggest FST.
 See the discussion here: 
 http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable


[ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712496#comment-13712496
 ] 

ASF subversion and git services commented on LUCENE-4542:
-

Commit 1504529 from [~jpountz] in branch 'dev/trunk'
[ https://svn.apache.org/r1504529 ]

LUCENE-4542: Make HunspellStemFilter's maximum recursion level configurable.

 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr
Assignee: Adrien Grand
 Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, 
 LUCENE-4542-with-solr.patch


 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance (f.ex. it costs 
 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for 
 recursion_cap=1). It would be nice to be able to tune this number as needed.
 AFAIK this number (2) was chosen arbitrary.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Shai Erera

We index time synonyms means you bloat the index with a lot of new
postings, most of them are just duplicates of each other. And in my case,
cause for every synonym there's a weight, I cannot even consider postings
deduplication...

There's a tradeoff here (as usual). Both approaches have pros and cons. I
think index time is better in the end because a larger index can be solved
by throwing more hardware at it. But queries with thousands of terms are a
real issue.

One thing I can look at is if the synonym sets can be 'grouped' in a way
that instead of all the terms I index a group ID or something and then
during search i resolve a term to all the groups it may belong to... I'll
need to think about it more.
On Jul 18, 2013 7:49 PM, Walter Underwood wun...@wunderwood.org wrote:

 There are two serious issues with query-time synonyms, speed and
 correctness.

 1. Expanding a term to 1000 synonyms at query time means 1000 term
 lookups. This will not be fast. Expanding the term at index time means 1000
 posting list entries, but only one term lookup at query time.

 2. Query time expansion will give higher scores to the more rare synonyms.
 This is almost never what you want. If I make TV and television
 synonyms, I want them both to score the same. But if TV is 10X more common
 than television, then documents with the rare term (television) will score
 better.

 wunder

 On Jul 18, 2013, at 5:54 AM, Shai Erera wrote:

 The examples I've seen so far are single words. But I learned today
 something new .. the number of synonyms returned for a word may be in the
 range of hundreds, sometimes even thousands.
 So I'm not sure query-time synonyms may work at all .. what do you think?

 Shai


 On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky 
 j...@basetechnology.comwrote:

   Your best bet is to preprocess queries and expand synonyms in your own
 application layer. The Lucene/Solr synonym implementation, design, and
 architecture is fairly lightweight (although FST is a big improvement) and
 not architected for large and dynamic synonym sets.

 Do you need multi-word phrase synonyms as well, or is this strictly
 single-word synonyms?

 -- Jack Krupansky

  *From:* Shai Erera ser...@gmail.com
 *Sent:* Thursday, July 18, 2013 1:36 AM
 *To:* dev@lucene.apache.org
 *Subject:* Programmatic Synonyms Filter (Lucene and/or Solr)

 Hi

 I was asked to integrate with a system which provides synonyms for words
 through API. I checked the existing synonym filters in Lucene and Solr and
 they all seem to take a synonyms map up front.

 E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so
 it's not really programmatic in the sense that I can provide an impl which
 will pull the synonyms through the other system's API.

 Solr SynonymFilterFactory just loads the synonyms from a file into a
 SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I
 can extend that one either.

 The problem is that the synonyms DB I should integrate with is HUGE and
 will probably not fit in RAM (SynonymMap). Nor is it currently possible to
 pull all available synonyms from it in one go. The API I have is something
 like String[] getSynonyms(String word).

 So I have few questions:

 1) Did I miss a Filter which does take a programmatic syn-map which I can
 provide my own impl to?

 2) If not, Would it make sense to modify SynonymMap to offer
 getSynonyms(word) API (using BytesRef / CharsRef of course), with an
 FSTSynonymMap default impl so that users can provide their own impl, e.g.
 not requiring everything to be in RAM?

 2.1) Side-effect benefit, I think, is that we won't require everyone to
 deal with the FST API that way, though I'll admit I cannot think of may use
 cases for not using SynonymFilter as-is ...

 3) If the answer to (1) and (2) is NO, I guess my only option is to
 implement my own SynonymFilter, copying most of the code from Lucene's ...
 right?

 Shai



 --
 Walter Underwood
 wun...@wunderwood.org

[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable

2013-07-18 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712503#comment-13712503
 ] 

ASF subversion and git services commented on LUCENE-4542:
-

Commit 1504531 from [~jpountz] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1504531 ]

LUCENE-4542: Make HunspellStemFilter's maximum recursion level configurable.

 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr
Assignee: Adrien Grand
 Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, 
 LUCENE-4542-with-solr.patch


 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance (f.ex. it costs 
 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for 
 recursion_cap=1). It would be nice to be able to tune this number as needed.
 AFAIK this number (2) was chosen arbitrary.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable


 [ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-4542.
--

   Resolution: Fixed
Fix Version/s: 4.5

Committed, thanks!

 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr
Assignee: Adrien Grand
 Fix For: 4.5

 Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, 
 LUCENE-4542-with-solr.patch


 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance (f.ex. it costs 
 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for 
 recursion_cap=1). It would be nice to be able to tune this number as needed.
 AFAIK this number (2) was chosen arbitrary.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Walter Underwood

Adding terms to posting lists is about the most space-efficient thing you can 
do in a search engine, so I would not worry too much about that.

wunder

On Jul 18, 2013, at 10:06 AM, Shai Erera wrote:

 We index time synonyms means you bloat the index with a lot of new postings, 
 most of them are just duplicates of each other. And in my case, cause for 
 every synonym there's a weight, I cannot even consider postings 
 deduplication...
 
 There's a tradeoff here (as usual). Both approaches have pros and cons. I 
 think index time is better in the end because a larger index can be solved by 
 throwing more hardware at it. But queries with thousands of terms are a real 
 issue.
 
 One thing I can look at is if the synonym sets can be 'grouped' in a way that 
 instead of all the terms I index a group ID or something and then during 
 search i resolve a term to all the groups it may belong to... I'll need to 
 think about it more.
 
 On Jul 18, 2013 7:49 PM, Walter Underwood wun...@wunderwood.org wrote:
 There are two serious issues with query-time synonyms, speed and correctness.
 
 1. Expanding a term to 1000 synonyms at query time means 1000 term lookups. 
 This will not be fast. Expanding the term at index time means 1000 posting 
 list entries, but only one term lookup at query time.
 
 2. Query time expansion will give higher scores to the more rare synonyms. 
 This is almost never what you want. If I make TV and television synonyms, 
 I want them both to score the same. But if TV is 10X more common than 
 television, then documents with the rare term (television) will score better.
 
 wunder
 
 On Jul 18, 2013, at 5:54 AM, Shai Erera wrote:
 
 The examples I've seen so far are single words. But I learned today 
 something new .. the number of synonyms returned for a word may be in the 
 range of hundreds, sometimes even thousands.
 So I'm not sure query-time synonyms may work at all .. what do you think?
 
 Shai
 
 
 On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky j...@basetechnology.com 
 wrote:
 Your best bet is to preprocess queries and expand synonyms in your own 
 application layer. The Lucene/Solr synonym implementation, design, and 
 architecture is fairly lightweight (although FST is a big improvement) and 
 not architected for large and dynamic synonym sets.
  
 Do you need multi-word phrase synonyms as well, or is this strictly 
 single-word synonyms?
 
 -- Jack Krupansky
  
 From: Shai Erera
 Sent: Thursday, July 18, 2013 1:36 AM
 To: dev@lucene.apache.org
 Subject: Programmatic Synonyms Filter (Lucene and/or Solr)
  
 Hi
 
 I was asked to integrate with a system which provides synonyms for words 
 through API. I checked the existing synonym filters in Lucene and Solr and 
 they all seem to take a synonyms map up front. 
 
 E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's 
 not really programmatic in the sense that I can provide an impl which will 
 pull the synonyms through the other system's API.
 
 Solr SynonymFilterFactory just loads the synonyms from a file into a 
 SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I 
 can extend that one either.
 
 The problem is that the synonyms DB I should integrate with is HUGE and will 
 probably not fit in RAM (SynonymMap). Nor is it currently possible to pull 
 all available synonyms from it in one go. The API I have is something like 
 String[] getSynonyms(String word).
 
 So I have few questions:
 
 1) Did I miss a Filter which does take a programmatic syn-map which I can 
 provide my own impl to?
 
 2) If not, Would it make sense to modify SynonymMap to offer 
 getSynonyms(word) API (using BytesRef / CharsRef of course), with an 
 FSTSynonymMap default impl so that users can provide their own impl, e.g. 
 not requiring everything to be in RAM?
 
 2.1) Side-effect benefit, I think, is that we won't require everyone to deal 
 with the FST API that way, though I'll admit I cannot think of may use cases 
 for not using SynonymFilter as-is ...
  
 3) If the answer to (1) and (2) is NO, I guess my only option is to 
 implement my own SynonymFilter, copying most of the code from Lucene's ... 
 right?
 
 Shai
 
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 

--
Walter Underwood
wun...@wunderwood.org

[jira] [Updated] (SOLR-5045) Pluggable Analytics


 [ 
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5045:
-

Summary: Pluggable Analytics  (was: Aggregating Collectors and 
AggregatorComponent)

 Pluggable Analytics
 ---

 Key: SOLR-5045
 URL: https://issues.apache.org/jira/browse/SOLR-5045
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-5045.patch, SOLR-5045.patch


 This ticket provides a pluggable aggregation framework through the 
 introduction of a new *Aggregator* interface and a new search component 
 called the *AggregatorComponent*.
 The *Aggregator* interface extends the PostFilter interface providing methods 
 that allow DelegatingCollectors to perform aggregation at collect time. 
 Aggregators were designed to play nicely with the CollapsingQParserPlugin 
 introduced in SOLR-5027. 
 The *AggregatorComponent* manages the output and distributed merging of 
 aggregate results.
 This ticket is an alternate design to SOLR-4465 which had the same basic idea 
 but a very different implementation. This implementation resolves the caching 
 issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
 collapsing. It is also much less intrusive on the core code as it's entirely 
 implemented with plugins.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5045) Aggregating Collectors and AggregatorComponent

2013-07-18 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5045:
-

Attachment: SOLR-5045.patch

Added SumQParserPlugin which is a very simple *Aggregator* implementation. At 
this point this class is only to show the mechanics of how an Aggregator works 
and to test the framework. This class will be iterated on to make it more full 
featured.

 Aggregating Collectors and AggregatorComponent
 --

 Key: SOLR-5045
 URL: https://issues.apache.org/jira/browse/SOLR-5045
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-5045.patch, SOLR-5045.patch


 This ticket provides a pluggable aggregation framework through the 
 introduction of a new *Aggregator* interface and a new search component 
 called the *AggregatorComponent*.
 The *Aggregator* interface extends the PostFilter interface providing methods 
 that allow DelegatingCollectors to perform aggregation at collect time. 
 Aggregators were designed to play nicely with the CollapsingQParserPlugin 
 introduced in SOLR-5027. 
 The *AggregatorComponent* manages the output and distributed merging of 
 aggregate results.
 This ticket is an alternate design to SOLR-4465 which had the same basic idea 
 but a very different implementation. This implementation resolves the caching 
 issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
 collapsing. It is also much less intrusive on the core code as it's entirely 
 implemented with plugins.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5049) switch to using luceneMatchVersion X.Y in example configs instead of LUCENE_XY


[ 
https://issues.apache.org/jira/browse/SOLR-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712511#comment-13712511
 ] 

ASF subversion and git services commented on SOLR-5049:
---

Commit 1504532 from hoss...@apache.org in branch 'dev/branches/lucene_solr_4_4'
[ https://svn.apache.org/r1504532 ]

SOLR-5049: use '4.4' as luceneMatchVersion in all example solrconfig.xml files

 switch to using luceneMatchVersion X.Y in example configs instead of 
 LUCENE_XY
 

 Key: SOLR-5049
 URL: https://issues.apache.org/jira/browse/SOLR-5049
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.4


 Uwe just pointed out to me on IRC that you can specify 
 {{luceneMatchVersion/}} using X.Y instead of the more internal java 
 variable esque LUCENE_XY.
 i have no idea why we haven't been doing this in the past ... it makes so 
 much more sense for end users, we should absolutely do this moving forward.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread Uwe Schindler

I think Jack is implicitely referring to Solr. In the case of a pure Lucene 
application without Solr or a custom query parser plugged into Solr that does 
the query-time expansion, the limit is not the URL length (which only applies 
to Solr as the query is part of the URL), but more the fact that Lucene refuses 
to run with more than 1024 BQ clauses J

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Thursday, July 18, 2013 4:05 PM
To: dev@lucene.apache.org
Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr)

 

Container (e.g., Tomcat) limit. Configurable. I don’t recall the specifics.


-- Jack Krupansky

 

From: Shai Erera mailto:ser...@gmail.com  

Sent: Thursday, July 18, 2013 9:46 AM

To: dev@lucene.apache.org 

Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr)

 

Actually, after chatting w/ Mike about it, he made a good point about making 
SynMap expose API like lookup(word), because that doesn't work with multi-word 
synonyms (e.g. wi fi - wifi). So I no longer think we should change 
SynFilter. Since in my case it's 1:1 (so much I learned so far), I should write 
my own TokenFilter.

So now the question is whether to do it at indexing time or search time. Each 
has pros and cons. I'll need to learn more about the DB first, e.g. how many 
words have only tens of synonyms and how many thousands. I suspect there's no 
single solution here, so will need to experiment with both.

Jack, I didn't quite follow the 2048 common limit -- is it a Solr limit of some 
sort? If so, can you please elaborate?

Shai

 

On Thu, Jul 18, 2013 at 4:12 PM, Jack Krupansky j...@basetechnology.com wrote:

Maybe a custom search component would be in order, to “enrich” the incoming 
query. Again, preprocessing the query for synonym expansion before Solr parses 
it. It could call the external synonym API and cache synonyms as well.

 

But, I’d still lean towards preprocessing in an application layer. Although, 
for hundreds or thousands of synonyms it would probably hit the 2048 common 
limit for URLs in some containers, which would need to be raised.


-- Jack Krupansky

 

From: Shai Erera mailto:ser...@gmail.com  

Sent: Thursday, July 18, 2013 8:54 AM

To: dev@lucene.apache.org 

Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr)

 

The examples I've seen so far are single words. But I learned today something 
new .. the number of synonyms returned for a word may be in the range of 
hundreds, sometimes even thousands.

So I'm not sure query-time synonyms may work at all .. what do you think?

Shai

 

On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky j...@basetechnology.com wrote:

Your best bet is to preprocess queries and expand synonyms in your own 
application layer. The Lucene/Solr synonym implementation, design, and 
architecture is fairly lightweight (although FST is a big improvement) and not 
architected for large and dynamic synonym sets.

 

Do you need multi-word phrase synonyms as well, or is this strictly single-word 
synonyms?


-- Jack Krupansky

 

From: Shai Erera mailto:ser...@gmail.com  

Sent: Thursday, July 18, 2013 1:36 AM

To: dev@lucene.apache.org 

Subject: Programmatic Synonyms Filter (Lucene and/or Solr)

 

Hi

I was asked to integrate with a system which provides synonyms for words 
through API. I checked the existing synonym filters in Lucene and Solr and they 
all seem to take a synonyms map up front. 

E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so it's 
not really programmatic in the sense that I can provide an impl which will pull 
the synonyms through the other system's API.

Solr SynonymFilterFactory just loads the synonyms from a file into a 
SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can 
extend that one either.

The problem is that the synonyms DB I should integrate with is HUGE and will 
probably not fit in RAM (SynonymMap). Nor is it currently possible to pull all 
available synonyms from it in one go. The API I have is something like String[] 
getSynonyms(String word).

So I have few questions:

1) Did I miss a Filter which does take a programmatic syn-map which I can 
provide my own impl to?

2) If not, Would it make sense to modify SynonymMap to offer getSynonyms(word) 
API (using BytesRef / CharsRef of course), with an FSTSynonymMap default impl 
so that users can provide their own impl, e.g. not requiring everything to be 
in RAM?

2.1) Side-effect benefit, I think, is that we won't require everyone to deal 
with the FST API that way, though I'll admit I cannot think of may use cases 
for not using SynonymFilter as-is ...

 

3) If the answer to (1) and (2) is NO, I guess my only option is to implement 
my own SynonymFilter, copying most of the code from Lucene's ... right?

Shai

[jira] [Commented] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion

2013-07-18 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712517#comment-13712517
 ] 

Jack Krupansky commented on SOLR-5048:
--

On the ReleaseToDo wiki I see this step: Find/replace LUCENE_XY - 
LUCENE_X(Y+1) across all of Lucene and Solr, but do NOT change usages under the 
lucene/analysis/ that allow version-specific behavior, but when I look at the 
example solrconfig.xml in branch_4x, it still says LUCENE_43, suggesting that 
this step has been skipped twice now. I think it should say 4.5, since 
LUCENE_45 is the only un-deprecated version for branch_4x now, right?

Maybe the step After creating a new release branch, update the version in the 
base release branch (e.g. branches/branch_4x) needed to be reviewed (or merely 
followed) a little more closely.

See:
http://wiki.apache.org/lucene-java/ReleaseTodo#Branching_.26_Feature_Freeze


 fail the build if the example solrconfig.xml files don't have an up to date 
 luceneMatchVersion
 --

 Key: SOLR-5048
 URL: https://issues.apache.org/jira/browse/SOLR-5048
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-5048.patch


 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... 
 the build should fail in a situation like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory

2013-07-18 Thread Robert Muir (JIRA)

Robert Muir created LUCENE-5119:
---

 Summary: DiskDV SortedDocValues shouldnt hold doc-to-ord in heap 
memory
 Key: LUCENE-5119
 URL: https://issues.apache.org/jira/browse/LUCENE-5119
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5119.patch

These are accessed sequentially when e.g. faceting, and can be a fairly large 
amount of data (based on # of docs and # of unique terms). 

I think this was done so that conceptually random access to a specific docid 
would be faster than eg. stored fields, but I think we should instead target 
the DV datastructures towards real use cases (faceting,sorting,grouping,...)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory

2013-07-18 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5119:


Attachment: LUCENE-5119.patch

 DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
 --

 Key: LUCENE-5119
 URL: https://issues.apache.org/jira/browse/LUCENE-5119
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5119.patch


 These are accessed sequentially when e.g. faceting, and can be a fairly large 
 amount of data (based on # of docs and # of unique terms). 
 I think this was done so that conceptually random access to a specific 
 docid would be faster than eg. stored fields, but I think we should instead 
 target the DV datastructures towards real use cases 
 (faceting,sorting,grouping,...)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion

2013-07-18 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712526#comment-13712526
 ] 

Steve Rowe commented on SOLR-5048:
--

bq. Maybe the step After creating a new release branch, update the version in 
the base release branch (e.g. branches/branch_4x) needed to be reviewed (or 
merely followed) a little more closely.

I agree - my responsibility this time around, and can see that I missed these 
in all the solrconfig.xml's ... mea culpa.

Nice catch, Jack.


 fail the build if the example solrconfig.xml files don't have an up to date 
 luceneMatchVersion
 --

 Key: SOLR-5048
 URL: https://issues.apache.org/jira/browse/SOLR-5048
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-5048.patch


 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... 
 the build should fail in a situation like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion


[ 
https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712527#comment-13712527
 ] 

Uwe Schindler commented on SOLR-5048:
-

We should also change the release todo to now use {{x.y}} as format in 
solrconfig.xml, see SOLR-5049.

bq. I think it should say 4.5, since LUCENE_45 is the only un-deprecated 
version for branch_4x now, right?

Yes.

 fail the build if the example solrconfig.xml files don't have an up to date 
 luceneMatchVersion
 --

 Key: SOLR-5048
 URL: https://issues.apache.org/jira/browse/SOLR-5048
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-5048.patch


 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... 
 the build should fail in a situation like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5045) Pluggable Analytics

[
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-5045:
-

Description:
This ticket provides a pluggable aggregation framework through the introduction
of a new *Aggregator* interface and a new search component called the
*AggregatorComponent*.

The *Aggregator* interface extends the PostFilter interface providing methods
that allow DelegatingCollectors to perform aggregation at collect time.
Aggregators were designed to play nicely with the CollapsingQParserPlugin
introduced in SOLR-5027.

The *AggregatorComponent* manages the output and distributed merging of
aggregate results.

This ticket is an alternate design to SOLR-4465 which had the same basic idea
but a very different implementation. This implementation resolves the caching
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field
collapsing. It is also much less intrusive on the core code as it's entirely
implemented with plugins.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=*%3A*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true

fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling to
sum the field popularity.

aggregate=true - turns on the AggregatorComponent

was:
This ticket provides a pluggable aggregation framework through the introduction
of a new *Aggregator* interface and a new search component called the
*AggregatorComponent*.

The *AggregatorComponent* manages the output and distributed merging of
aggregate results.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=*%3A*wt=xmlindent=truefq={!sum field=popularity
id=mysum}aggregate=true

fq={!sum field=popularity id=mysum} - Calls the SumQParserPlugin telling to sum
the field popularity.

aggregate=true - turns on the AggregatorComponent

Pluggable Analytics
---

Key: SOLR-5045
URL: https://issues.apache.org/jira/browse/SOLR-5045
Project: Solr
Issue Type: New Feature
Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
Fix For: 5.0

Attachments: SOLR-5045.patch, SOLR-5045.patch

This ticket provides a pluggable aggregation framework through the
introduction of a new *Aggregator* interface and a new search component
called the *AggregatorComponent*.
The *Aggregator* interface extends the PostFilter interface providing methods
that allow DelegatingCollectors to perform aggregation at collect time.
Aggregators were designed to play nicely with the CollapsingQParserPlugin
introduced in SOLR-5027.
The *AggregatorComponent* manages the output and distributed merging of
aggregate results.
This ticket is an alternate design to SOLR-4465 which had the same basic idea
but a very different implementation. This implementation resolves the caching
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field
collapsing. It is also much less intrusive on the core code as it's entirely
implemented with plugins.
Initial Syntax for the sample SumQParserPlugin Aggregator:
../select?q=*%3A*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true
fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling to
sum the field popularity.
aggregate=true - turns on the AggregatorComponent

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5045) Pluggable Analytics

[
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-5045:
-

Description:
This ticket provides a pluggable aggregation framework through the introduction
of a new *Aggregator* interface and a new search component called the
*AggregatorComponent*.

The *AggregatorComponent* manages the output and distributed merging of
aggregate results.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=*%3A*wt=xmlindent=truefq={!sum field=popularity
id=mysum}aggregate=true

fq={!sum field=popularity id=mysum} - Calls the SumQParserPlugin telling to sum
the field popularity.

aggregate=true - turns on the AggregatorComponent

was:
This ticket provides a pluggable aggregation framework through the introduction
of a new *Aggregator* interface and a new search component called the
*AggregatorComponent*.

The *AggregatorComponent* manages the output and distributed merging of
aggregate results.

Pluggable Analytics
---

Attachments: SOLR-5045.patch, SOLR-5045.patch

This ticket provides a pluggable aggregation framework through the
introduction of a new *Aggregator* interface and a new search component
called the *AggregatorComponent*.
The *Aggregator* interface extends the PostFilter interface providing methods
that allow DelegatingCollectors to perform aggregation at collect time.
Aggregators were designed to play nicely with the CollapsingQParserPlugin
introduced in SOLR-5027.
The *AggregatorComponent* manages the output and distributed merging of
aggregate results.
This ticket is an alternate design to SOLR-4465 which had the same basic idea
but a very different implementation. This implementation resolves the caching
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field
collapsing. It is also much less intrusive on the core code as it's entirely
implemented with plugins.
Initial Syntax for the sample SumQParserPlugin Aggregator:
../select?q=*%3A*wt=xmlindent=truefq={!sum field=popularity
id=mysum}aggregate=true
fq={!sum field=popularity id=mysum} - Calls the SumQParserPlugin telling to
sum the field popularity.
aggregate=true - turns on the AggregatorComponent

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5045) Pluggable Analytics

[
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-5045:
-

Description:
This ticket provides a pluggable aggregation framework through the introduction
of a new *Aggregator* interface and a new search component called the
*AggregatorComponent*.

The *AggregatorComponent* manages the output and distributed merging of
aggregate results.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=*:*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true

fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it
to sum the field popularity.

aggregate=true - turns on the AggregatorComponent

was:
This ticket provides a pluggable aggregation framework through the introduction
of a new *Aggregator* interface and a new search component called the
*AggregatorComponent*.

The *AggregatorComponent* manages the output and distributed merging of
aggregate results.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=*%3A*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true

fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it
to sum the field popularity.

aggregate=true - turns on the AggregatorComponent

Pluggable Analytics
---

Attachments: SOLR-5045.patch, SOLR-5045.patch

This ticket provides a pluggable aggregation framework through the
introduction of a new *Aggregator* interface and a new search component
called the *AggregatorComponent*.
The *Aggregator* interface extends the PostFilter interface providing methods
that allow DelegatingCollectors to perform aggregation at collect time.
Aggregators were designed to play nicely with the CollapsingQParserPlugin
introduced in SOLR-5027.
The *AggregatorComponent* manages the output and distributed merging of
aggregate results.
This ticket is an alternate design to SOLR-4465 which had the same basic idea
but a very different implementation. This implementation resolves the caching
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field
collapsing. It is also much less intrusive on the core code as it's entirely
implemented with plugins.
Initial Syntax for the sample SumQParserPlugin Aggregator:
../select?q=*:*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true
fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it
to sum the field popularity.
aggregate=true - turns on the AggregatorComponent

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable

2013-07-18 Thread Steve Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-4542:
---

Fix Version/s: (was: 4.5)
   4.4
   5.0

 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr
Assignee: Steve Rowe
 Fix For: 5.0, 4.4

 Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, 
 LUCENE-4542-with-solr.patch


 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance (f.ex. it costs 
 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for 
 recursion_cap=1). It would be nice to be able to tune this number as needed.
 AFAIK this number (2) was chosen arbitrary.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable

2013-07-18 Thread Steve Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe reopened LUCENE-4542:


  Assignee: Steve Rowe  (was: Adrien Grand)

Reopening to backport to 4.4, based on conversation with Adrien on #lucene-dev 
IRC.

 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr
Assignee: Steve Rowe
 Fix For: 4.5

 Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, 
 LUCENE-4542-with-solr.patch


 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance (f.ex. it costs 
 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for 
 recursion_cap=1). It would be nice to be able to tune this number as needed.
 AFAIK this number (2) was chosen arbitrary.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5045) Pluggable Analytics

[
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-5045:
-

Description:
This ticket provides a pluggable aggregation framework through the introduction
of a new *Aggregator* interface and a new search component called the
*AggregatorComponent*.

The *AggregatorComponent* manages the output and distributed merging of
aggregate results.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=*%3A*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true

fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it
to sum the field popularity.

aggregate=true - turns on the AggregatorComponent

was:
This ticket provides a pluggable aggregation framework through the introduction
of a new *Aggregator* interface and a new search component called the
*AggregatorComponent*.

The *AggregatorComponent* manages the output and distributed merging of
aggregate results.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=*%3A*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true

fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling to
sum the field popularity.

aggregate=true - turns on the AggregatorComponent

Pluggable Analytics
---

Attachments: SOLR-5045.patch, SOLR-5045.patch

This ticket provides a pluggable aggregation framework through the
introduction of a new *Aggregator* interface and a new search component
called the *AggregatorComponent*.
The *Aggregator* interface extends the PostFilter interface providing methods
that allow DelegatingCollectors to perform aggregation at collect time.
Aggregators were designed to play nicely with the CollapsingQParserPlugin
introduced in SOLR-5027.
The *AggregatorComponent* manages the output and distributed merging of
aggregate results.
This ticket is an alternate design to SOLR-4465 which had the same basic idea
but a very different implementation. This implementation resolves the caching
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field
collapsing. It is also much less intrusive on the core code as it's entirely
implemented with plugins.
Initial Syntax for the sample SumQParserPlugin Aggregator:
../select?q=*%3A*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true
fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it
to sum the field popularity.
aggregate=true - turns on the AggregatorComponent

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5045) Pluggable Analytics

[
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-5045:
-

Description:
This ticket provides a pluggable aggregation framework through the introduction
of a new *Aggregator* interface and a new search component called the
*AggregatorComponent*.

The *AggregatorComponent* manages the output and distributed merging of
aggregate results.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it
to sum the field popularity.

*aggregate=true* - turns on the AggregatorComponent

was:
This ticket provides a pluggable aggregation framework through the introduction
of a new *Aggregator* interface and a new search component called the
*AggregatorComponent*.

The *AggregatorComponent* manages the output and distributed merging of
aggregate results.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true

fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it
to sum the field popularity.

aggregate=true - turns on the AggregatorComponent

Pluggable Analytics
---

Attachments: SOLR-5045.patch, SOLR-5045.patch

This ticket provides a pluggable aggregation framework through the
introduction of a new *Aggregator* interface and a new search component
called the *AggregatorComponent*.
The *Aggregator* interface extends the PostFilter interface providing methods
that allow DelegatingCollectors to perform aggregation at collect time.
Aggregators were designed to play nicely with the CollapsingQParserPlugin
introduced in SOLR-5027.
The *AggregatorComponent* manages the output and distributed merging of
aggregate results.
This ticket is an alternate design to SOLR-4465 which had the same basic idea
but a very different implementation. This implementation resolves the caching
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field
collapsing. It is also much less intrusive on the core code as it's entirely
implemented with plugins.
Initial Syntax for the sample SumQParserPlugin Aggregator:
../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true
*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling
it to sum the field popularity.
*aggregate=true* - turns on the AggregatorComponent

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5045) Pluggable Analytics

[
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-5045:
-

Description:
This ticket provides a pluggable aggregation framework through the introduction
of a new *Aggregator* interface and a new search component called the
*AggregatorComponent*.

The *AggregatorComponent* manages the output and distributed merging of
aggregate results.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true

fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it
to sum the field popularity.

aggregate=true - turns on the AggregatorComponent

was:
This ticket provides a pluggable aggregation framework through the introduction
of a new *Aggregator* interface and a new search component called the
*AggregatorComponent*.

The *AggregatorComponent* manages the output and distributed merging of
aggregate results.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=*:*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true

fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it
to sum the field popularity.

aggregate=true - turns on the AggregatorComponent

Pluggable Analytics
---

Attachments: SOLR-5045.patch, SOLR-5045.patch

This ticket provides a pluggable aggregation framework through the
introduction of a new *Aggregator* interface and a new search component
called the *AggregatorComponent*.
The *Aggregator* interface extends the PostFilter interface providing methods
that allow DelegatingCollectors to perform aggregation at collect time.
Aggregators were designed to play nicely with the CollapsingQParserPlugin
introduced in SOLR-5027.
The *AggregatorComponent* manages the output and distributed merging of
aggregate results.
This ticket is an alternate design to SOLR-4465 which had the same basic idea
but a very different implementation. This implementation resolves the caching
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field
collapsing. It is also much less intrusive on the core code as it's entirely
implemented with plugins.
Initial Syntax for the sample SumQParserPlugin Aggregator:
../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true
fq=\{!sum field=popularity id=mysum\} - Calls the SumQParserPlugin telling it
to sum the field popularity.
aggregate=true - turns on the AggregatorComponent

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Programmatic Synonyms Filter (Lucene and/or Solr)

2013-07-18 Thread ASF subversion and git services (JIRA)

Agreed, to some extent. I think mostly it is a matter of how frequently the 
synonyms may be updated. OTOH, it is always technically possible to analyze 
synonym updates, perform queries on both the old and new synonyms, and then 
update the index for the documents containing synonym changes.

How much do you know about the frequency of synonym updates for this synonym 
source API?

-- Jack Krupansky

From: Shai Erera 
Sent: Thursday, July 18, 2013 1:06 PM
To: dev@lucene.apache.org 
Subject: Re: Programmatic Synonyms Filter (Lucene and/or Solr)

We index time synonyms means you bloat the index with a lot of new postings, 
most of them are just duplicates of each other. And in my case, cause for every 
synonym there's a weight, I cannot even consider postings deduplication...

There's a tradeoff here (as usual). Both approaches have pros and cons. I think 
index time is better in the end because a larger index can be solved by 
throwing more hardware at it. But queries with thousands of terms are a real 
issue. 

One thing I can look at is if the synonym sets can be 'grouped' in a way that 
instead of all the terms I index a group ID or something and then during search 
i resolve a term to all the groups it may belong to... I'll need to think about 
it more. 

On Jul 18, 2013 7:49 PM, Walter Underwood wun...@wunderwood.org wrote:

  There are two serious issues with query-time synonyms, speed and correctness. 

  1. Expanding a term to 1000 synonyms at query time means 1000 term lookups. 
This will not be fast. Expanding the term at index time means 1000 posting list 
entries, but only one term lookup at query time.

  2. Query time expansion will give higher scores to the more rare synonyms. 
This is almost never what you want. If I make TV and television synonyms, I 
want them both to score the same. But if TV is 10X more common than television, 
then documents with the rare term (television) will score better.

  wunder

  On Jul 18, 2013, at 5:54 AM, Shai Erera wrote:


The examples I've seen so far are single words. But I learned today 
something new .. the number of synonyms returned for a word may be in the 
range of hundreds, sometimes even thousands.

So I'm not sure query-time synonyms may work at all .. what do you think?

Shai




On Thu, Jul 18, 2013 at 3:21 PM, Jack Krupansky j...@basetechnology.com 
wrote:

  Your best bet is to preprocess queries and expand synonyms in your own 
application layer. The Lucene/Solr synonym implementation, design, and 
architecture is fairly lightweight (although FST is a big improvement) and not 
architected for large and dynamic synonym sets.

  Do you need multi-word phrase synonyms as well, or is this strictly 
single-word synonyms?

  -- Jack Krupansky

  From: Shai Erera 
  Sent: Thursday, July 18, 2013 1:36 AM
  To: dev@lucene.apache.org 
  Subject: Programmatic Synonyms Filter (Lucene and/or Solr)

  Hi


  I was asked to integrate with a system which provides synonyms for words 
through API. I checked the existing synonym filters in Lucene and Solr and they 
all seem to take a synonyms map up front. 

  E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so 
it's not really programmatic in the sense that I can provide an impl which will 
pull the synonyms through the other system's API.


  Solr SynonymFilterFactory just loads the synonyms from a file into a 
SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I can 
extend that one either.


  The problem is that the synonyms DB I should integrate with is HUGE and 
will probably not fit in RAM (SynonymMap). Nor is it currently possible to pull 
all available synonyms from it in one go. The API I have is something like 
String[] getSynonyms(String word).


  So I have few questions:


  1) Did I miss a Filter which does take a programmatic syn-map which I can 
provide my own impl to?


  2) If not, Would it make sense to modify SynonymMap to offer 
getSynonyms(word) API (using BytesRef / CharsRef of course), with an 
FSTSynonymMap default impl so that users can provide their own impl, e.g. not 
requiring everything to be in RAM?


  2.1) Side-effect benefit, I think, is that we won't require everyone to 
deal with the FST API that way, though I'll admit I cannot think of may use 
cases for not using SynonymFilter as-is ...


  3) If the answer to (1) and (2) is NO, I guess my only option is to 
implement my own SynonymFilter, copying most of the code from Lucene's ... 
right?

  Shai



  --
  Walter Underwood
  wun...@wunderwood.org

[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable


[ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712539#comment-13712539
 ] 

ASF subversion and git services commented on LUCENE-4542:
-

Commit 1504561 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1504561 ]

LUCENE-4542: move CHANGES.txt entry to the 4.4 section

 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr
Assignee: Steve Rowe
 Fix For: 5.0, 4.4

 Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, 
 LUCENE-4542-with-solr.patch


 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance (f.ex. it costs 
 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for 
 recursion_cap=1). It would be nice to be able to tune this number as needed.
 AFAIK this number (2) was chosen arbitrary.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5045) Pluggable Analytics

2013-07-18 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-5045:
-

Description:
This ticket provides a pluggable aggregation framework through the introduction
of a new *Aggregator* interface and a new search component called the
*AggregatorComponent*.

The *AggregatorComponent* manages the output and distributed merging of
aggregate results.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it
to sum the field popularity.

*aggregate=true* - turns on the AggregatorComponent

was:
This ticket provides a pluggable aggregation framework through the introduction
of a new *Aggregator* interface and a new search component called the
*AggregatorComponent*.

The *AggregatorComponent* manages the output and distributed merging of
aggregate results.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it
to sum the field popularity.

*aggregate=true* - turns on the AggregatorComponent

The out will contain a block that looks like this:

%3Clst%20name%3D%22aggregates%22%3E%0A%20%20%3Clst%20name%3D%22mysum%22%3E%0A%20%20%20%20%3Clong%20name%3D%22sum%22%3E85%3C%2Flong%3E%0A%20%20%3C%2Flst%3E%0A%3C%2Flst%3E

Pluggable Analytics
---

Attachments: SOLR-5045.patch, SOLR-5045.patch

This ticket provides a pluggable aggregation framework through the
introduction of a new *Aggregator* interface and a new search component
called the *AggregatorComponent*.
The *Aggregator* interface extends the PostFilter interface providing methods
that allow DelegatingCollectors to perform aggregation at collect time.
Aggregators were designed to play nicely with the CollapsingQParserPlugin
introduced in SOLR-5027.
The *AggregatorComponent* manages the output and distributed merging of
aggregate results.
This ticket is an alternate design to SOLR-4465 which had the same basic idea
but a very different implementation. This implementation resolves the caching
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field
collapsing. It is also much less intrusive on the core code as it's entirely
implemented with plugins.
Initial Syntax for the sample SumQParserPlugin Aggregator:
../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true
*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling
it to sum the field popularity.
*aggregate=true* - turns on the AggregatorComponent

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable


[ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712541#comment-13712541
 ] 

ASF subversion and git services commented on LUCENE-4542:
-

Commit 1504563 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1504563 ]

LUCENE-4542: move CHANGES.txt entry to the 4.4 section (merged trunk r1504561)

 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr
Assignee: Steve Rowe
 Fix For: 5.0, 4.4

 Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, 
 LUCENE-4542-with-solr.patch


 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance (f.ex. it costs 
 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for 
 recursion_cap=1). It would be nice to be able to tune this number as needed.
 AFAIK this number (2) was chosen arbitrary.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5050) forbidden-apis errors

Hoss Man created SOLR-5050:
--

 Summary: forbidden-apis errors
 Key: SOLR-5050
 URL: https://issues.apache.org/jira/browse/SOLR-5050
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man


I'm not sure if i'm the only one seeing this, or if it's a relatively newly 
introduced error, but on trunk...

{noformat}
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] Forbidden method invocation: 
java.util.Properties#load(java.io.InputStream) [Properties files should be 
read/written with Reader/Writer, using UTF-8 charset. This allows reading older 
files with unicode escapes, too.]
[forbidden-apis]   in org.apache.solr.core.SolrCoreDiscoverer 
(SolrCoreDiscoverer.java:75)
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] Scanned 2442 (and 1361 related) class file(s) for forbidden 
API invocations (in 1.91s), 1 error(s).

BUILD FAILED
/home/hossman/lucene/dev/build.xml:67: The following error occurred while 
executing this line:
/home/hossman/lucene/dev/solr/build.xml:263: Check for forbidden API calls 
failed, see log.
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5045) Pluggable Analytics

[
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-5045:
-

Description:
This ticket provides a pluggable aggregation framework through the introduction
of a new *Aggregator* interface and a new search component called the
*AggregatorComponent*.

The *AggregatorComponent* manages the output and distributed merging of
aggregate results.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it
to sum the field popularity.

*aggregate=true* - turns on the AggregatorComponent

The out will contain a block that looks like this:

%3Clst%20name%3D%22aggregates%22%3E%0A%20%20%3Clst%20name%3D%22mysum%22%3E%0A%20%20%20%20%3Clong%20name%3D%22sum%22%3E85%3C%2Flong%3E%0A%20%20%3C%2Flst%3E%0A%3C%2Flst%3E

was:
This ticket provides a pluggable aggregation framework through the introduction
of a new *Aggregator* interface and a new search component called the
*AggregatorComponent*.

The *AggregatorComponent* manages the output and distributed merging of
aggregate results.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it
to sum the field popularity.

*aggregate=true* - turns on the AggregatorComponent

Pluggable Analytics
---

Attachments: SOLR-5045.patch, SOLR-5045.patch

This ticket provides a pluggable aggregation framework through the
introduction of a new *Aggregator* interface and a new search component
called the *AggregatorComponent*.
The *Aggregator* interface extends the PostFilter interface providing methods
that allow DelegatingCollectors to perform aggregation at collect time.
Aggregators were designed to play nicely with the CollapsingQParserPlugin
introduced in SOLR-5027.
The *AggregatorComponent* manages the output and distributed merging of
aggregate results.
This ticket is an alternate design to SOLR-4465 which had the same basic idea
but a very different implementation. This implementation resolves the caching
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field
collapsing. It is also much less intrusive on the core code as it's entirely
implemented with plugins.
Initial Syntax for the sample SumQParserPlugin Aggregator:
../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true
*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling
it to sum the field popularity.
*aggregate=true* - turns on the AggregatorComponent
The out will contain a block that looks like this:
%3Clst%20name%3D%22aggregates%22%3E%0A%20%20%3Clst%20name%3D%22mysum%22%3E%0A%20%20%20%20%3Clong%20name%3D%22sum%22%3E85%3C%2Flong%3E%0A%20%20%3C%2Flst%3E%0A%3C%2Flst%3E

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands,

[jira] [Updated] (SOLR-5045) Pluggable Analytics


 [ 
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5045:
-

Description: 
This ticket provides a pluggable aggregation framework through the introduction 
of a new *Aggregator* interface and a new search component called the 
*AggregatorComponent*.

The *Aggregator* interface extends the PostFilter interface providing methods 
that allow DelegatingCollectors to perform aggregation at collect time. 
Aggregators were designed to play nicely with the CollapsingQParserPlugin 
introduced in SOLR-5027. 

The *AggregatorComponent* manages the output and distributed merging of 
aggregate results.

This ticket is an alternate design to SOLR-4465 which had the same basic idea 
but a very different implementation. This implementation resolves the caching 
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
collapsing. It is also much less intrusive on the core code as it's entirely 
implemented with plugins.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it 
to sum the field popularity.

*aggregate=true*  - turns on the AggregatorComponent

The output contains a block that looks like this:

lt;lst name=quot;aggregatesquot;gt;
  lt;lst name=quot;mysumquot;gt;
lt;long name=quot;sumquot;gt;85lt;/longgt;
  lt;/lstgt;
lt;/lst











  was:
This ticket provides a pluggable aggregation framework through the introduction 
of a new *Aggregator* interface and a new search component called the 
*AggregatorComponent*.

The *Aggregator* interface extends the PostFilter interface providing methods 
that allow DelegatingCollectors to perform aggregation at collect time. 
Aggregators were designed to play nicely with the CollapsingQParserPlugin 
introduced in SOLR-5027. 

The *AggregatorComponent* manages the output and distributed merging of 
aggregate results.

This ticket is an alternate design to SOLR-4465 which had the same basic idea 
but a very different implementation. This implementation resolves the caching 
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
collapsing. It is also much less intrusive on the core code as it's entirely 
implemented with plugins.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it 
to sum the field popularity.

*aggregate=true*  - turns on the AggregatorComponent












 Pluggable Analytics
 ---

 Key: SOLR-5045
 URL: https://issues.apache.org/jira/browse/SOLR-5045
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-5045.patch, SOLR-5045.patch


 This ticket provides a pluggable aggregation framework through the 
 introduction of a new *Aggregator* interface and a new search component 
 called the *AggregatorComponent*.
 The *Aggregator* interface extends the PostFilter interface providing methods 
 that allow DelegatingCollectors to perform aggregation at collect time. 
 Aggregators were designed to play nicely with the CollapsingQParserPlugin 
 introduced in SOLR-5027. 
 The *AggregatorComponent* manages the output and distributed merging of 
 aggregate results.
 This ticket is an alternate design to SOLR-4465 which had the same basic idea 
 but a very different implementation. This implementation resolves the caching 
 issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
 collapsing. It is also much less intrusive on the core code as it's entirely 
 implemented with plugins.
 Initial Syntax for the sample SumQParserPlugin Aggregator:
 ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
 id=mysum\}aggregate=true
 *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling 
 it to sum the field popularity.
 *aggregate=true*  - turns on the AggregatorComponent
 The output contains a block that looks like this:
 lt;lst name=quot;aggregatesquot;gt;
   lt;lst name=quot;mysumquot;gt;
 lt;long name=quot;sumquot;gt;85lt;/longgt;
   lt;/lstgt;
 lt;/lst

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory


[ 
https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712551#comment-13712551
 ] 

David Smiley commented on LUCENE-5119:
--

Would it be easy to add random access as an option?  Looking at your patch, 
which was pretty simple, it doesn't appear that it'd be hard to support random 
access should an application which to want this.

A realistic example in my mind is a spatial filter in which a potentially large 
binary geometry representations of a shape is encoded for each document into 
DiskDV.  Some fast leading filters narrow down the applicable documents but 
some documents shape geometry need to be consulted in the DiskDV afterwards.  
Does that make sense?

 DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
 --

 Key: LUCENE-5119
 URL: https://issues.apache.org/jira/browse/LUCENE-5119
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5119.patch


 These are accessed sequentially when e.g. faceting, and can be a fairly large 
 amount of data (based on # of docs and # of unique terms). 
 I think this was done so that conceptually random access to a specific 
 docid would be faster than eg. stored fields, but I think we should instead 
 target the DV datastructures towards real use cases 
 (faceting,sorting,grouping,...)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5045) Pluggable Analytics

2013-07-18 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712548#comment-13712548
 ] 

Jack Krupansky commented on SOLR-5045:
--

One interesting test case: There has been some interest in adding median to 
the stats component. The difficulty is that you need to build up the frequency 
distribution so that you can find the value that is = half of the values, 
which is a lot more effort than simply adding values to an accumulator.


 Pluggable Analytics
 ---

 Key: SOLR-5045
 URL: https://issues.apache.org/jira/browse/SOLR-5045
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-5045.patch, SOLR-5045.patch


 This ticket provides a pluggable aggregation framework through the 
 introduction of a new *Aggregator* interface and a new search component 
 called the *AggregatorComponent*.
 The *Aggregator* interface extends the PostFilter interface providing methods 
 that allow DelegatingCollectors to perform aggregation at collect time. 
 Aggregators were designed to play nicely with the CollapsingQParserPlugin 
 introduced in SOLR-5027. 
 The *AggregatorComponent* manages the output and distributed merging of 
 aggregate results.
 This ticket is an alternate design to SOLR-4465 which had the same basic idea 
 but a very different implementation. This implementation resolves the caching 
 issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
 collapsing. It is also much less intrusive on the core code as it's entirely 
 implemented with plugins.
 Initial Syntax for the sample SumQParserPlugin Aggregator:
 ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
 id=mysum\}aggregate=true
 *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling 
 it to sum the field popularity.
 *aggregate=true*  - turns on the AggregatorComponent
 The output contains a block that looks like this:
 lt;lst name=quot;aggregatesquot;gt;
   {lt;lst name=quot;mysumquot;gt;}
 lt;long name=quot;sumquot;gt;85lt;/longgt;
   lt;/lstgt;
 lt;/lst

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5045) Pluggable Analytics

2013-07-18 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5045:
-

Description: 
This ticket provides a pluggable aggregation framework through the introduction 
of a new *Aggregator* interface and a new search component called the 
*AggregatorComponent*.

The *Aggregator* interface extends the PostFilter interface providing methods 
that allow DelegatingCollectors to perform aggregation at collect time. 
Aggregators were designed to play nicely with the CollapsingQParserPlugin 
introduced in SOLR-5027. 

The *AggregatorComponent* manages the output and distributed merging of 
aggregate results.

This ticket is an alternate design to SOLR-4465 which had the same basic idea 
but a very different implementation. This implementation resolves the caching 
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
collapsing. It is also much less intrusive on the core code as it's entirely 
implemented with plugins.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it 
to sum the field popularity.

*aggregate=true*  - turns on the AggregatorComponent

The output contains a block that looks like this:

lt;lst name=quot;aggregatesquot;gt;
  {lt;lst name=quot;mysumquot;gt;
{lt;long name=quot;sumquot;gt;85lt;/longgt;}
  lt;/lstgt;}
lt;/lst











  was:
This ticket provides a pluggable aggregation framework through the introduction 
of a new *Aggregator* interface and a new search component called the 
*AggregatorComponent*.

The *Aggregator* interface extends the PostFilter interface providing methods 
that allow DelegatingCollectors to perform aggregation at collect time. 
Aggregators were designed to play nicely with the CollapsingQParserPlugin 
introduced in SOLR-5027. 

The *AggregatorComponent* manages the output and distributed merging of 
aggregate results.

This ticket is an alternate design to SOLR-4465 which had the same basic idea 
but a very different implementation. This implementation resolves the caching 
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
collapsing. It is also much less intrusive on the core code as it's entirely 
implemented with plugins.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it 
to sum the field popularity.

*aggregate=true*  - turns on the AggregatorComponent

The output contains a block that looks like this:

lt;lst name=quot;aggregatesquot;gt;
  lt;lst name=quot;mysumquot;gt;
lt;long name=quot;sumquot;gt;85lt;/longgt;
  lt;/lstgt;
lt;/lst












 Pluggable Analytics
 ---

 Key: SOLR-5045
 URL: https://issues.apache.org/jira/browse/SOLR-5045
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-5045.patch, SOLR-5045.patch


 This ticket provides a pluggable aggregation framework through the 
 introduction of a new *Aggregator* interface and a new search component 
 called the *AggregatorComponent*.
 The *Aggregator* interface extends the PostFilter interface providing methods 
 that allow DelegatingCollectors to perform aggregation at collect time. 
 Aggregators were designed to play nicely with the CollapsingQParserPlugin 
 introduced in SOLR-5027. 
 The *AggregatorComponent* manages the output and distributed merging of 
 aggregate results.
 This ticket is an alternate design to SOLR-4465 which had the same basic idea 
 but a very different implementation. This implementation resolves the caching 
 issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
 collapsing. It is also much less intrusive on the core code as it's entirely 
 implemented with plugins.
 Initial Syntax for the sample SumQParserPlugin Aggregator:
 ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
 id=mysum\}aggregate=true
 *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling 
 it to sum the field popularity.
 *aggregate=true*  - turns on the AggregatorComponent
 The output contains a block that looks like this:
 lt;lst name=quot;aggregatesquot;gt;
   {lt;lst name=quot;mysumquot;gt;
 {lt;long name=quot;sumquot;gt;85lt;/longgt;}
   lt;/lstgt;}
 lt;/lst

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SOLR-5049) switch to using luceneMatchVersion X.Y in example configs instead of LUCENE_XY


[ 
https://issues.apache.org/jira/browse/SOLR-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712549#comment-13712549
 ] 

ASF subversion and git services commented on SOLR-5049:
---

Commit 1504566 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1504566 ]

SOLR-5049: use '5.0' as luceneMatchVersion in all example solrconfig.xml files

 switch to using luceneMatchVersion X.Y in example configs instead of 
 LUCENE_XY
 

 Key: SOLR-5049
 URL: https://issues.apache.org/jira/browse/SOLR-5049
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.4


 Uwe just pointed out to me on IRC that you can specify 
 {{luceneMatchVersion/}} using X.Y instead of the more internal java 
 variable esque LUCENE_XY.
 i have no idea why we haven't been doing this in the past ... it makes so 
 much more sense for end users, we should absolutely do this moving forward.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5045) Pluggable Analytics


 [ 
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5045:
-

Description: 
This ticket provides a pluggable aggregation framework through the introduction 
of a new *Aggregator* interface and a new search component called the 
*AggregatorComponent*.

The *Aggregator* interface extends the PostFilter interface providing methods 
that allow DelegatingCollectors to perform aggregation at collect time. 
Aggregators were designed to play nicely with the CollapsingQParserPlugin 
introduced in SOLR-5027. 

The *AggregatorComponent* manages the output and distributed merging of 
aggregate results.

This ticket is an alternate design to SOLR-4465 which had the same basic idea 
but a very different implementation. This implementation resolves the caching 
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
collapsing. It is also much less intrusive on the core code as it's entirely 
implemented with plugins.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it 
to sum the field popularity.

*aggregate=true*  - turns on the AggregatorComponent

The output contains a block that looks like this:

lt;lst name=quot;aggregatesquot;gt;
  {lt;lst name=quot;mysumquot;gt;}
lt;long name=quot;sumquot;gt;85lt;/longgt;
  lt;/lstgt;
lt;/lst











  was:
This ticket provides a pluggable aggregation framework through the introduction 
of a new *Aggregator* interface and a new search component called the 
*AggregatorComponent*.

The *Aggregator* interface extends the PostFilter interface providing methods 
that allow DelegatingCollectors to perform aggregation at collect time. 
Aggregators were designed to play nicely with the CollapsingQParserPlugin 
introduced in SOLR-5027. 

The *AggregatorComponent* manages the output and distributed merging of 
aggregate results.

This ticket is an alternate design to SOLR-4465 which had the same basic idea 
but a very different implementation. This implementation resolves the caching 
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
collapsing. It is also much less intrusive on the core code as it's entirely 
implemented with plugins.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it 
to sum the field popularity.

*aggregate=true*  - turns on the AggregatorComponent

The output contains a block that looks like this:

lt;lst name=quot;aggregatesquot;gt;
  {lt;lst name=quot;mysumquot;gt;
{lt;long name=quot;sumquot;gt;85lt;/longgt;}
  lt;/lstgt;}
lt;/lst












 Pluggable Analytics
 ---

 Key: SOLR-5045
 URL: https://issues.apache.org/jira/browse/SOLR-5045
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-5045.patch, SOLR-5045.patch


 This ticket provides a pluggable aggregation framework through the 
 introduction of a new *Aggregator* interface and a new search component 
 called the *AggregatorComponent*.
 The *Aggregator* interface extends the PostFilter interface providing methods 
 that allow DelegatingCollectors to perform aggregation at collect time. 
 Aggregators were designed to play nicely with the CollapsingQParserPlugin 
 introduced in SOLR-5027. 
 The *AggregatorComponent* manages the output and distributed merging of 
 aggregate results.
 This ticket is an alternate design to SOLR-4465 which had the same basic idea 
 but a very different implementation. This implementation resolves the caching 
 issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
 collapsing. It is also much less intrusive on the core code as it's entirely 
 implemented with plugins.
 Initial Syntax for the sample SumQParserPlugin Aggregator:
 ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
 id=mysum\}aggregate=true
 *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling 
 it to sum the field popularity.
 *aggregate=true*  - turns on the AggregatorComponent
 The output contains a block that looks like this:
 lt;lst name=quot;aggregatesquot;gt;
   {lt;lst name=quot;mysumquot;gt;}
 lt;long name=quot;sumquot;gt;85lt;/longgt;
   lt;/lstgt;
 lt;/lst

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (SOLR-5045) Pluggable Analytics


 [ 
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5045:
-

Description: 
This ticket provides a pluggable aggregation framework through the introduction 
of a new *Aggregator* interface and a new search component called the 
*AggregatorComponent*.

The *Aggregator* interface extends the PostFilter interface providing methods 
that allow DelegatingCollectors to perform aggregation at collect time. 
Aggregators were designed to play nicely with the CollapsingQParserPlugin 
introduced in SOLR-5027. 

The *AggregatorComponent* manages the output and distributed merging of 
aggregate results.

This ticket is an alternate design to SOLR-4465 which had the same basic idea 
but a very different implementation. This implementation resolves the caching 
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
collapsing. It is also much less intrusive on the core code as it's entirely 
implemented with plugins.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it 
to sum the field popularity.

*aggregate=true*  - turns on the AggregatorComponent

The output contains a block that looks like this:

lt;lst name=quot;aggregatesquot;gt;
  lt;lst name=quot;mysumquot;gt;
lt;long name=quot;sumquot;gt;85lt;/longgt;
  lt;/lstgt;
lt;/lst











  was:
This ticket provides a pluggable aggregation framework through the introduction 
of a new *Aggregator* interface and a new search component called the 
*AggregatorComponent*.

The *Aggregator* interface extends the PostFilter interface providing methods 
that allow DelegatingCollectors to perform aggregation at collect time. 
Aggregators were designed to play nicely with the CollapsingQParserPlugin 
introduced in SOLR-5027. 

The *AggregatorComponent* manages the output and distributed merging of 
aggregate results.

This ticket is an alternate design to SOLR-4465 which had the same basic idea 
but a very different implementation. This implementation resolves the caching 
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
collapsing. It is also much less intrusive on the core code as it's entirely 
implemented with plugins.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it 
to sum the field popularity.

*aggregate=true*  - turns on the AggregatorComponent

The output contains a block that looks like this:

lt;lst name=quot;aggregatesquot;gt;
  {lt;lst name=quot;mysumquot;gt;}
lt;long name=quot;sumquot;gt;85lt;/longgt;
  lt;/lstgt;
lt;/lst












 Pluggable Analytics
 ---

 Key: SOLR-5045
 URL: https://issues.apache.org/jira/browse/SOLR-5045
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-5045.patch, SOLR-5045.patch


 This ticket provides a pluggable aggregation framework through the 
 introduction of a new *Aggregator* interface and a new search component 
 called the *AggregatorComponent*.
 The *Aggregator* interface extends the PostFilter interface providing methods 
 that allow DelegatingCollectors to perform aggregation at collect time. 
 Aggregators were designed to play nicely with the CollapsingQParserPlugin 
 introduced in SOLR-5027. 
 The *AggregatorComponent* manages the output and distributed merging of 
 aggregate results.
 This ticket is an alternate design to SOLR-4465 which had the same basic idea 
 but a very different implementation. This implementation resolves the caching 
 issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
 collapsing. It is also much less intrusive on the core code as it's entirely 
 implemented with plugins.
 Initial Syntax for the sample SumQParserPlugin Aggregator:
 ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
 id=mysum\}aggregate=true
 *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling 
 it to sum the field popularity.
 *aggregate=true*  - turns on the AggregatorComponent
 The output contains a block that looks like this:
 lt;lst name=quot;aggregatesquot;gt;
   lt;lst name=quot;mysumquot;gt;
 lt;long name=quot;sumquot;gt;85lt;/longgt;
   lt;/lstgt;
 lt;/lst

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (SOLR-5045) Pluggable Analytics


 [ 
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5045:
-

Description: 
This ticket provides a pluggable aggregation framework through the introduction 
of a new *Aggregator* interface and a new search component called the 
*AggregatorComponent*.

The *Aggregator* interface extends the PostFilter interface providing methods 
that allow DelegatingCollectors to perform aggregation at collect time. 
Aggregators were designed to play nicely with the CollapsingQParserPlugin 
introduced in SOLR-5027. 

The *AggregatorComponent* manages the output and distributed merging of 
aggregate results.

This ticket is an alternate design to SOLR-4465 which had the same basic idea 
but a very different implementation. This implementation resolves the caching 
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
collapsing. It is also much less intrusive on the core code as it's entirely 
implemented with plugins.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it 
to sum the field popularity.

*aggregate=true*  - turns on the AggregatorComponent

The output contains a block that looks like this:
{code:xml}
lt;lst name=quot;aggregatesquot;gt;
  lt;lst name=quot;mysumquot;gt;
lt;long name=quot;sumquot;gt;85lt;/longgt;
  lt;/lstgt;
lt;/lst
{code}











  was:
This ticket provides a pluggable aggregation framework through the introduction 
of a new *Aggregator* interface and a new search component called the 
*AggregatorComponent*.

The *Aggregator* interface extends the PostFilter interface providing methods 
that allow DelegatingCollectors to perform aggregation at collect time. 
Aggregators were designed to play nicely with the CollapsingQParserPlugin 
introduced in SOLR-5027. 

The *AggregatorComponent* manages the output and distributed merging of 
aggregate results.

This ticket is an alternate design to SOLR-4465 which had the same basic idea 
but a very different implementation. This implementation resolves the caching 
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
collapsing. It is also much less intrusive on the core code as it's entirely 
implemented with plugins.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it 
to sum the field popularity.

*aggregate=true*  - turns on the AggregatorComponent

The output contains a block that looks like this:

lt;lst name=quot;aggregatesquot;gt;
  lt;lst name=quot;mysumquot;gt;
lt;long name=quot;sumquot;gt;85lt;/longgt;
  lt;/lstgt;
lt;/lst












 Pluggable Analytics
 ---

 Key: SOLR-5045
 URL: https://issues.apache.org/jira/browse/SOLR-5045
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-5045.patch, SOLR-5045.patch


 This ticket provides a pluggable aggregation framework through the 
 introduction of a new *Aggregator* interface and a new search component 
 called the *AggregatorComponent*.
 The *Aggregator* interface extends the PostFilter interface providing methods 
 that allow DelegatingCollectors to perform aggregation at collect time. 
 Aggregators were designed to play nicely with the CollapsingQParserPlugin 
 introduced in SOLR-5027. 
 The *AggregatorComponent* manages the output and distributed merging of 
 aggregate results.
 This ticket is an alternate design to SOLR-4465 which had the same basic idea 
 but a very different implementation. This implementation resolves the caching 
 issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
 collapsing. It is also much less intrusive on the core code as it's entirely 
 implemented with plugins.
 Initial Syntax for the sample SumQParserPlugin Aggregator:
 ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
 id=mysum\}aggregate=true
 *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling 
 it to sum the field popularity.
 *aggregate=true*  - turns on the AggregatorComponent
 The output contains a block that looks like this:
 {code:xml}
 lt;lst name=quot;aggregatesquot;gt;
   lt;lst name=quot;mysumquot;gt;
 lt;long name=quot;sumquot;gt;85lt;/longgt;
   lt;/lstgt;
 lt;/lst
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SOLR-5045) Pluggable Analytics

2013-07-18 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712554#comment-13712554
 ] 

Jack Krupansky commented on SOLR-5045:
--

Can I script some custom analytics?

Or is that simply a question of how this new component hooks in with the 
proposed JavaScriptRequestHandler (SOLR-5005)?


 Pluggable Analytics
 ---

 Key: SOLR-5045
 URL: https://issues.apache.org/jira/browse/SOLR-5045
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-5045.patch, SOLR-5045.patch


 This ticket provides a pluggable aggregation framework through the 
 introduction of a new *Aggregator* interface and a new search component 
 called the *AggregatorComponent*.
 The *Aggregator* interface extends the PostFilter interface providing methods 
 that allow DelegatingCollectors to perform aggregation at collect time. 
 Aggregators were designed to play nicely with the CollapsingQParserPlugin 
 introduced in SOLR-5027. 
 The *AggregatorComponent* manages the output and distributed merging of 
 aggregate results.
 This ticket is an alternate design to SOLR-4465 which had the same basic idea 
 but a very different implementation. This implementation resolves the caching 
 issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
 collapsing. It is also much less intrusive on the core code as it's entirely 
 implemented with plugins.
 Initial Syntax for the sample SumQParserPlugin Aggregator:
 ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
 id=mysum\}aggregate=true
 *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling 
 it to sum the field popularity.
 *aggregate=true*  - turns on the AggregatorComponent
 The output contains a block that looks like this:
 {code:xml}
 lt;lst name=quot;aggregatesquot;gt;
   lt;lst name=quot;mysumquot;gt;
 lt;long name=quot;sumquot;gt;85lt;/longgt;
   lt;/lstgt;
 lt;/lst
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory

2013-07-18 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712558#comment-13712558
 ] 

Robert Muir commented on LUCENE-5119:
-

I dont plan to do this. Thats why we have a codec api...

 DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
 --

 Key: LUCENE-5119
 URL: https://issues.apache.org/jira/browse/LUCENE-5119
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5119.patch


 These are accessed sequentially when e.g. faceting, and can be a fairly large 
 amount of data (based on # of docs and # of unique terms). 
 I think this was done so that conceptually random access to a specific 
 docid would be faster than eg. stored fields, but I think we should instead 
 target the DV datastructures towards real use cases 
 (faceting,sorting,grouping,...)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5045) Pluggable Analytics


 [ 
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5045:
-

Description: 
This ticket provides a pluggable aggregation framework through the introduction 
of a new *Aggregator* interface and a new search component called the 
*AggregatorComponent*.

The *Aggregator* interface extends the PostFilter interface providing methods 
that allow DelegatingCollectors to perform aggregation at collect time. 
Aggregators were designed to play nicely with the CollapsingQParserPlugin 
introduced in SOLR-5027. 

The *AggregatorComponent* manages the output and distributed merging of 
aggregate results.

This ticket is an alternate design to SOLR-4465 which had the same basic idea 
but a very different implementation. This implementation resolves the caching 
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
collapsing. It is also much less intrusive on the core code as it's entirely 
implemented with plugins.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it 
to sum the field popularity.

*aggregate=true*  - turns on the AggregatorComponent

The output contains a block that looks like this:
{code:xml}
lst name=aggregates
  lst name=mysum
long name=sum85/long
  /lst
/lst
{code}











  was:
This ticket provides a pluggable aggregation framework through the introduction 
of a new *Aggregator* interface and a new search component called the 
*AggregatorComponent*.

The *Aggregator* interface extends the PostFilter interface providing methods 
that allow DelegatingCollectors to perform aggregation at collect time. 
Aggregators were designed to play nicely with the CollapsingQParserPlugin 
introduced in SOLR-5027. 

The *AggregatorComponent* manages the output and distributed merging of 
aggregate results.

This ticket is an alternate design to SOLR-4465 which had the same basic idea 
but a very different implementation. This implementation resolves the caching 
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
collapsing. It is also much less intrusive on the core code as it's entirely 
implemented with plugins.

Initial Syntax for the sample SumQParserPlugin Aggregator:

../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
id=mysum\}aggregate=true

*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling it 
to sum the field popularity.

*aggregate=true*  - turns on the AggregatorComponent

The output contains a block that looks like this:
{code:xml}
lt;lst name=quot;aggregatesquot;gt;
  lt;lst name=quot;mysumquot;gt;
lt;long name=quot;sumquot;gt;85lt;/longgt;
  lt;/lstgt;
lt;/lst
{code}












 Pluggable Analytics
 ---

 Key: SOLR-5045
 URL: https://issues.apache.org/jira/browse/SOLR-5045
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-5045.patch, SOLR-5045.patch


 This ticket provides a pluggable aggregation framework through the 
 introduction of a new *Aggregator* interface and a new search component 
 called the *AggregatorComponent*.
 The *Aggregator* interface extends the PostFilter interface providing methods 
 that allow DelegatingCollectors to perform aggregation at collect time. 
 Aggregators were designed to play nicely with the CollapsingQParserPlugin 
 introduced in SOLR-5027. 
 The *AggregatorComponent* manages the output and distributed merging of 
 aggregate results.
 This ticket is an alternate design to SOLR-4465 which had the same basic idea 
 but a very different implementation. This implementation resolves the caching 
 issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field 
 collapsing. It is also much less intrusive on the core code as it's entirely 
 implemented with plugins.
 Initial Syntax for the sample SumQParserPlugin Aggregator:
 ../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity 
 id=mysum\}aggregate=true
 *fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling 
 it to sum the field popularity.
 *aggregate=true*  - turns on the AggregatorComponent
 The output contains a block that looks like this:
 {code:xml}
 lst name=aggregates
   lst name=mysum
 long name=sum85/long
   /lst
 /lst
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail:

[jira] [Resolved] (SOLR-5050) forbidden-apis errors


 [ 
https://issues.apache.org/jira/browse/SOLR-5050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-5050.


Resolution: Not A Problem

my mistake ... apparently i wasn't as clean as i thought

 forbidden-apis errors
 -

 Key: SOLR-5050
 URL: https://issues.apache.org/jira/browse/SOLR-5050
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man

 I'm not sure if i'm the only one seeing this, or if it's a relatively newly 
 introduced error, but on trunk...
 {noformat}
 [forbidden-apis] Scanning for API signatures and dependencies...
 [forbidden-apis] Forbidden method invocation: 
 java.util.Properties#load(java.io.InputStream) [Properties files should be 
 read/written with Reader/Writer, using UTF-8 charset. This allows reading 
 older files with unicode escapes, too.]
 [forbidden-apis]   in org.apache.solr.core.SolrCoreDiscoverer 
 (SolrCoreDiscoverer.java:75)
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
 the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] Scanned 2442 (and 1361 related) class file(s) for forbidden 
 API invocations (in 1.91s), 1 error(s).
 BUILD FAILED
 /home/hossman/lucene/dev/build.xml:67: The following error occurred while 
 executing this line:
 /home/hossman/lucene/dev/solr/build.xml:263: Check for forbidden API calls 
 failed, see log.
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-5050) forbidden-apis errors


 [ 
https://issues.apache.org/jira/browse/SOLR-5050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler closed SOLR-5050.
---


 forbidden-apis errors
 -

 Key: SOLR-5050
 URL: https://issues.apache.org/jira/browse/SOLR-5050
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man

 I'm not sure if i'm the only one seeing this, or if it's a relatively newly 
 introduced error, but on trunk...
 {noformat}
 [forbidden-apis] Scanning for API signatures and dependencies...
 [forbidden-apis] Forbidden method invocation: 
 java.util.Properties#load(java.io.InputStream) [Properties files should be 
 read/written with Reader/Writer, using UTF-8 charset. This allows reading 
 older files with unicode escapes, too.]
 [forbidden-apis]   in org.apache.solr.core.SolrCoreDiscoverer 
 (SolrCoreDiscoverer.java:75)
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
 the classpath!
 [forbidden-apis] WARNING: The referenced class 
 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. 
 Please fix the classpath!
 [forbidden-apis] Scanned 2442 (and 1361 related) class file(s) for forbidden 
 API invocations (in 1.91s), 1 error(s).
 BUILD FAILED
 /home/hossman/lucene/dev/build.xml:67: The following error occurred while 
 executing this line:
 /home/hossman/lucene/dev/solr/build.xml:263: Check for forbidden API calls 
 failed, see log.
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion


 [ 
https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5048:
---

Attachment: SOLR-5048.patch

much simpler patch that depends on the changes in SOLR-5049

 fail the build if the example solrconfig.xml files don't have an up to date 
 luceneMatchVersion
 --

 Key: SOLR-5048
 URL: https://issues.apache.org/jira/browse/SOLR-5048
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-5048.patch, SOLR-5048.patch


 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... 
 the build should fail in a situation like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5045) Pluggable Analytics

2013-07-18 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712566#comment-13712566
]

Joel Bernstein commented on SOLR-5045:
--

You have the flexibility to calculate median, atleast on a single server. Not
sure what the best approach to this would be. Distributed median may be harder.
You'd have to build up distributions in a way that can be merged.

Scripting is a very cool thing. I need to do some research though on SOLR-5005
and see if can be applied.

Pluggable Analytics
---

Attachments: SOLR-5045.patch, SOLR-5045.patch

This ticket provides a pluggable aggregation framework through the
introduction of a new *Aggregator* interface and a new search component
called the *AggregatorComponent*.
The *Aggregator* interface extends the PostFilter interface providing methods
that allow DelegatingCollectors to perform aggregation at collect time.
Aggregators were designed to play nicely with the CollapsingQParserPlugin
introduced in SOLR-5027.
The *AggregatorComponent* manages the output and distributed merging of
aggregate results.
This ticket is an alternate design to SOLR-4465 which had the same basic idea
but a very different implementation. This implementation resolves the caching
issues in SOLR-4465 and combined with SOLR-5027 plays nicely with field
collapsing. It is also much less intrusive on the core code as it's entirely
implemented with plugins.
Initial Syntax for the sample SumQParserPlugin Aggregator:
../select?q=\*:\*wt=xmlindent=truefq=\{!sum field=popularity
id=mysum\}aggregate=true
*fq=\{!sum field=popularity id=mysum\}* - Calls the SumQParserPlugin telling
it to sum the field popularity.
*aggregate=true* - turns on the AggregatorComponent
The output contains a block that looks like this:
{code:xml}
lst name=aggregates
lst name=mysum
long name=sum85/long
/lst
/lst
{code}

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5048) fail the build if the example solrconfig.xml files don't have an up to date luceneMatchVersion


[ 
https://issues.apache.org/jira/browse/SOLR-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712567#comment-13712567
 ] 

ASF subversion and git services commented on SOLR-5048:
---

Commit 1504570 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1504570 ]

SOLR-5048: fail the build if the example solrconfig.xml files do not have an up 
to date luceneMatchVersion

 fail the build if the example solrconfig.xml files don't have an up to date 
 luceneMatchVersion
 --

 Key: SOLR-5048
 URL: https://issues.apache.org/jira/browse/SOLR-5048
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-5048.patch, SOLR-5048.patch


 4.4 RC0 still had {{luceneMatchVersionLUCENE_43/luceneMatchVersion}} ... 
 the build should fail in a situation like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable

2013-07-18 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712568#comment-13712568
 ] 

ASF subversion and git services commented on LUCENE-4542:
-

Commit 1504571 from [~steve_rowe] in branch 'dev/branches/lucene_solr_4_4'
[ https://svn.apache.org/r1504571 ]

LUCENE-4542: Make HunspellStemFilter's maximum recursion level configurable  
move CHANGES.txt entry to the 4.4 section (merged trunk r1504529  r1504561)

 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr
Assignee: Steve Rowe
 Fix For: 5.0, 4.4

 Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, 
 LUCENE-4542-with-solr.patch


 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance (f.ex. it costs 
 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for 
 recursion_cap=1). It would be nice to be able to tune this number as needed.
 AFAIK this number (2) was chosen arbitrary.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5045) Pluggable Analytics