PyLucene 4.8.0 - 'make install' with 'root=' missing a couple of files

2014-05-27 Thread Eduard Rozenberg
Hello folks,

I’m working on packaging PyLucene to a Slackware
package by using a setup.cfg in the source directory 
and redirecting the installation root to 
/tmp/pylucene_installdir.

I noticed that a couple of files are missing when doing 
this alternate root install compared to the regular install.

## setup.cfg ###

[easy_install]

[build]

[install]
root = /tmp/pylucene_installdir
compile = False
force = True
single-version-externally-managed = True

##

I noticed that two files are missing when I do this root=
install compared to the regular install to /usr/lib…/
Are these two files below not necessary when
packaging PyLucene for distribution?

Missing: native_libs.txt
--
Contains: 
lucene/_lucene.so


Missing: _lucene.py
--
Contains: 
def __bootstrap__():
global __bootstrap__, __loader__, __file__
import sys, pkg_resources, imp
__file__ = pkg_resources.resource_filename(__name__, '_lucene.so')
__loader__ = None; del __bootstrap__, __loader__
imp.load_dynamic(__name__,__file__)
__bootstrap__()


Thanks in advance!

Regards,
—Ed

Re: PyLucene 4.8.0 - 'make install' with 'root=' missing a couple of files

2014-05-27 Thread Andi Vajda


On Tue, 27 May 2014, Eduard Rozenberg wrote:


Hello folks,

I?m working on packaging PyLucene to a Slackware
package by using a setup.cfg in the source directory
and redirecting the installation root to
/tmp/pylucene_installdir.

I noticed that a couple of files are missing when doing
this alternate root install compared to the regular install.

## setup.cfg ###

[easy_install]

[build]

[install]
root = /tmp/pylucene_installdir
compile = False
force = True
single-version-externally-managed = True

##

I noticed that two files are missing when I do this root=
install compared to the regular install to /usr/lib?/
Are these two files below not necessary when
packaging PyLucene for distribution?

Missing: native_libs.txt


I don't know what this file, native_libs.txt, is for. Maybe a setuptools 
artifact ?



--
Contains:
lucene/_lucene.so


Missing: _lucene.py


Yes, that one you need.
Did you try running pylucene tests without it ?

Andi..


--
Contains:
def __bootstrap__():
   global __bootstrap__, __loader__, __file__
   import sys, pkg_resources, imp
   __file__ = pkg_resources.resource_filename(__name__, '_lucene.so')
   __loader__ = None; del __bootstrap__, __loader__
   imp.load_dynamic(__name__,__file__)
__bootstrap__()


Thanks in advance!

Regards,
?Ed



Getting term vectors/computing cosine similarity

2014-05-27 Thread Michael O'Leary
*tl;dnr*: a next() method is defined for the Java class TVTermsEnum in
Lucene 4.8.1, but it looks like there is no next() method available for an
object that looks like it is an instance of the Python class TVTermsEnum in
PyLucene 4.8.1.

I have a set of documents that I would like to cluster. These documents
share a vocabulary of only about 3,000 unique terms, but there are about
15,000,000 documents. One way I thought of doing this would be to index the
documents using PyLucene (Python is the preferred programming language at
work), obtain term vectors for the documents using PyLucene API functions,
and calculate cosine similarities between pairs of term vectors in order to
determine which documents are close to each other.

I found some sample Java code on the web that various people have posted
showing ways to do this with older versions of Lucene. I downloaded
PyLucene 4.8.1 and compared its API functions with the ones used in the
code samples, and saw that this is an area of Lucene that has changed quite
a bit. I can send an email to the lucene-user mailing group to ask what
would be a good way of doing this using version 4.8.1, but the question I
have for this mailing group has to do with some Java API functions that it
looks like are not exposed in Python, unless I have to go about accessing
them in a different way.

If I obtain the term vector for the field cat_ids in a document with id
doc_id_1

doc_1_tfv = reader.getTermVector(doc_id_1, cat_ids)

then doc_1_tfv is displayed as this object:

Terms:
org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTerms@32c46396


In some of the sample code I looked at, the terms in doc_1_tfv could be
obtained with doc_1_tfv.getTerms(), but it looks like getTerms is not a
member function of Terms or its subclasses any more. In another code
sample, an iterator for the term vector is obtained via tfv_iter =
doc_1_tfv.iterator(None) and then the terms are obtained one by one with
calls to tfv_iter.next(). This is where I get stuck. tfv_iter has this
value:

TermsEnum:
org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum@1cca2369


and there is a next() function defined for the TVTermsEnum class, but this
object doesn't list next() as one of its member functions and an exception
is raised if it is called. It looks like the object only supports the
member functions defined for the TermsEnum class, and next() is not one of
them. Is this the case, or is there a way have it support all of the
TVTermsEnum member functions, including next()? TVTermsEnum is a private
class in CompressingTermVectorsReader.java.

So I am wondering if there is a way to obtain term vectors in this way and
that I am just not treating doc_1_tfv and tfv_iter in the right way, or if
there is a different, better way to get term vectors for documents in a
PyLucene index, or if this isn't something that Lucene should be used for.
Thank you very much for any help you can provide.
Mike


[jira] [Commented] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)

2014-05-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009465#comment-14009465
 ] 

Jan Høydahl commented on SOLR-6113:
---

It is actually by design, if you e.g. disallow all fielded search with 
{{uf=-*}} then eDismax will not interpret the x:y as field and value but as a 
valid query for literal x y. Some languages use : as separator, e.g. swedish 
will write {{FN:s}} (meaning UN's). The same approach is taken when some field 
names are disallowed.

But I see that it can be confusing for people who indend to search a field, it 
would be better if Solr could give a feedback that fielded search is not 
allowed and that it falled back to literal matching.

 Edismax doesn't parse well the query uf (User Fields)
 -

 Key: SOLR-6113
 URL: https://issues.apache.org/jira/browse/SOLR-6113
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Reporter: Liram Vardi

 It seems that Edismax User Fields feature does not behave as expected.
 For instance, assuming the following query:
 _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_
 The parsed query (taken from query debug info) is:
 _+((id:b* (text:user) (text:anna collins))~1)_
 I expect that because user was filtered out in uf (User fields), the 
 parsed query should not contain the user search part.
 In another words, the parsed query should look simply like this:  _+id:b*_
 This issue is affected by a the patch on issue SOLR-2649: When changing the 
 default OP of Edismax to AND, the query results change.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5674) A new token filter: SubSequence

2014-05-27 Thread Nitzan Shaked (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nitzan Shaked updated LUCENE-5674:
--

Attachment: (was: subseqfilter.patch)

 A new token filter: SubSequence
 ---

 Key: LUCENE-5674
 URL: https://issues.apache.org/jira/browse/LUCENE-5674
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nitzan Shaked
Priority: Minor
   Original Estimate: 24h
  Remaining Estimate: 24h

 A new configurable token filter which, given a token breaks it into sub-parts 
 and outputs consecutive sub-sequences of those sub-parts.
 Useful for, for example, using during indexing to generate variations on 
 domain names, so that www.google.com can be found by searching for 
 google.com, or www.google.com.
 Parameters:
 sepRegexp: A regular expression used split incoming tokens into sub-parts.
 glue: A string used to concatenate sub-parts together when creating 
 sub-sequences.
 minLen: Minimum length (in sub-parts) of output sub-sequences
 maxLen: Maximum length (in sub-parts) of output sub-sequences (0 for 
 unlimited; negative numbers for token length in sub-parts minus specified 
 length)
 anchor: Anchor.START to output only prefixes, or Anchor.END to output only 
 suffixes, or Anchor.NONE to output any sub-sequence
 withOriginal: whether to output also the original token
 EDIT: now includes tests for filter and for factory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5674) A new token filter: SubSequence

2014-05-27 Thread Nitzan Shaked (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nitzan Shaked updated LUCENE-5674:
--

Attachment: subseqfilter.patch

Updated patch, contains new format header for the one place that used the 
old format header

 A new token filter: SubSequence
 ---

 Key: LUCENE-5674
 URL: https://issues.apache.org/jira/browse/LUCENE-5674
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Nitzan Shaked
Priority: Minor
 Attachments: subseqfilter.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 A new configurable token filter which, given a token breaks it into sub-parts 
 and outputs consecutive sub-sequences of those sub-parts.
 Useful for, for example, using during indexing to generate variations on 
 domain names, so that www.google.com can be found by searching for 
 google.com, or www.google.com.
 Parameters:
 sepRegexp: A regular expression used split incoming tokens into sub-parts.
 glue: A string used to concatenate sub-parts together when creating 
 sub-sequences.
 minLen: Minimum length (in sub-parts) of output sub-sequences
 maxLen: Maximum length (in sub-parts) of output sub-sequences (0 for 
 unlimited; negative numbers for token length in sub-parts minus specified 
 length)
 anchor: Anchor.START to output only prefixes, or Anchor.END to output only 
 suffixes, or Anchor.NONE to output any sub-sequence
 withOriginal: whether to output also the original token
 EDIT: now includes tests for filter and for factory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6077) Create 5 minute tutorial

2014-05-27 Thread Taka (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Taka updated SOLR-6077:
---

Attachment: 5minTutorial-v01.markdown

Here's my first draft (v0.1) for the 5 minute tutorial (in markdown).Please 
post any feedback.

For the future release of Solr, I would like to update or add better examples. 
Should I file this separately in JIRA?

 Create 5 minute tutorial
 

 Key: SOLR-6077
 URL: https://issues.apache.org/jira/browse/SOLR-6077
 Project: Solr
  Issue Type: Sub-task
Reporter: Grant Ingersoll
 Attachments: 5minTutorial-v01.markdown


 Per the new site design for Solr, we'd like to have a 5 minutes to Solr 
 tutorial that covers users getting their data in and querying it.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6078) Create First Day Documentation

2014-05-27 Thread Taka (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009508#comment-14009508
 ] 

Taka commented on SOLR-6078:


I've been trying to figure out the contents for the first day tutorial.

It should include more than the current tutorial.
https://lucene.apache.org/solr/4_8_1/tutorial.html

But these wiki contents are more for the first week.
http://wiki.apache.org/solr/
https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

I'm trying to identify appropriate list of topics for the first day doc. For 
example, should SolrCloud be included in this doc? Do you know good data sets 
(UCI ML Repo, Wikipedia, etc.) for this tutorial? 

I would appreciate any suggestions/ideas!

 Create First Day Documentation
 --

 Key: SOLR-6078
 URL: https://issues.apache.org/jira/browse/SOLR-6078
 Project: Solr
  Issue Type: Sub-task
Reporter: Grant Ingersoll

 As one progresses from getting started with Solr, it is important to show how 
 their work will develop from simple acts with basic data sets, to more 
 complex.  This tutorial should highlight what a user is likely to need to 
 know in their first day with Solr.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6077) Create 5 minute tutorial

2014-05-27 Thread Taka (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009510#comment-14009510
 ] 

Taka commented on SOLR-6077:


I will include instructions for Windows later. v0.1 is only for unix-based 
systems.

 Create 5 minute tutorial
 

 Key: SOLR-6077
 URL: https://issues.apache.org/jira/browse/SOLR-6077
 Project: Solr
  Issue Type: Sub-task
Reporter: Grant Ingersoll
 Attachments: 5minTutorial-v01.markdown


 Per the new site design for Solr, we'd like to have a 5 minutes to Solr 
 tutorial that covers users getting their data in and querying it.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 559 - Still Failing

2014-05-27 Thread Dawid Weiss
 java.lang.ArrayStoreException: unknown
 at 
 __randomizedtesting.SeedInfo.seed([97ABA0C8320EBCCB:E5E785C7836E0AB8]:0)
 at 
 org.apache.lucene.util.RamUsageEstimator$IdentityHashSet.add(RamUsageEstimator.java:674)

Oh boy... what the hell is this? :)

D.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6077) Create 5 minute tutorial

2014-05-27 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009539#comment-14009539
 ] 

Varun Thacker commented on SOLR-6077:
-

I think we should edit point 1 to be - Apache Solr runs of Java 7 or greater 
instead of Java 1.7 u55 or higher.

I don't think we need to point out that it is recommended to use u55 for 
someone just wanting to try out Solr in 5 mins. 

Regarding Point 4 - Maybe we could spin this into another issue but currently 
The CSV example has 10 documents.
The JSON example has 4 documents.
The XML example has 32 documents. 

We should at least have the same documents in all the examples and have 
slightly more documents in each? 


 Create 5 minute tutorial
 

 Key: SOLR-6077
 URL: https://issues.apache.org/jira/browse/SOLR-6077
 Project: Solr
  Issue Type: Sub-task
Reporter: Grant Ingersoll
 Attachments: 5minTutorial-v01.markdown


 Per the new site design for Solr, we'd like to have a 5 minutes to Solr 
 tutorial that covers users getting their data in and querying it.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5675) ID postings format

2014-05-27 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009545#comment-14009545
 ] 

Michael McCandless commented on LUCENE-5675:


Thanks Steve!

 ID postings format
 

 Key: LUCENE-5675
 URL: https://issues.apache.org/jira/browse/LUCENE-5675
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: 4.9, 5.0
Reporter: Robert Muir
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5675.patch


 Today the primary key lookup in lucene is not that great for systems like 
 solr and elasticsearch that have versioning in front of IndexWriter.
 To some extend BlockTree can sometimes help avoid seeks by telling you the 
 term does not exist for a segment. But this technique (based on FST prefix) 
 is fragile. The only other choice today is bloom filters, which use up huge 
 amounts of memory.
 I don't think we are using everything we know: particularly the version 
 semantics.
 Instead, if the FST for the terms index used an algebra that represents the 
 max version for any subtree, we might be able to answer that there is no term 
 T with version  V in that segment very efficiently.
 Also ID fields dont need postings lists, they dont need stats like 
 docfreq/totaltermfreq, etc this stuff is all implicit. 
 As far as API, i think for users to provide IDs with versions to such a PF, 
 a start would to set a payload or whatever on the term field to get it thru 
 indexwriter to the codec. And a consumer of the codec can just cast the 
 Terms to a subclass that exposes the FST to do this version check efficiently.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5708) Remove IndexWriterConfig.clone

2014-05-27 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-5708:
--

 Summary: Remove IndexWriterConfig.clone
 Key: LUCENE-5708
 URL: https://issues.apache.org/jira/browse/LUCENE-5708
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0


We originally added this clone to allow a single IWC to be re-used against more 
than one IndexWriter, but I think this is a mis-feature: it adds complexity to 
hairy classes (merge policy/scheduler, DW thread pool, etc.), I think it's 
buggy today.

I think we should just disallow sharing: you must make a new IWC for a new 
IndexWriter.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5708) Remove IndexWriterConfig.clone

2014-05-27 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5708:
---

Attachment: LUCENE-5708.patch

Initial patch, tests seem to pass.  IWC already detects if it's illegally 
re-used across more than one IW.

 Remove IndexWriterConfig.clone
 --

 Key: LUCENE-5708
 URL: https://issues.apache.org/jira/browse/LUCENE-5708
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5708.patch


 We originally added this clone to allow a single IWC to be re-used against 
 more than one IndexWriter, but I think this is a mis-feature: it adds 
 complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I 
 think it's buggy today.
 I think we should just disallow sharing: you must make a new IWC for a new 
 IndexWriter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5708) Remove IndexWriterConfig.clone

2014-05-27 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5708:
---

Attachment: LUCENE-5708.patch

Woops, wrong patch ... this one should work.

 Remove IndexWriterConfig.clone
 --

 Key: LUCENE-5708
 URL: https://issues.apache.org/jira/browse/LUCENE-5708
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5708.patch, LUCENE-5708.patch


 We originally added this clone to allow a single IWC to be re-used against 
 more than one IndexWriter, but I think this is a mis-feature: it adds 
 complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I 
 think it's buggy today.
 I think we should just disallow sharing: you must make a new IWC for a new 
 IndexWriter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 559 - Still Failing

2014-05-27 Thread Robert Muir
I think its the Chuck Norris object: and he can do this even without G1GC

On Tue, May 27, 2014 at 11:09 AM, Dawid Weiss
dawid.we...@cs.put.poznan.pl wrote:
 java.lang.ArrayStoreException: unknown
 at 
 __randomizedtesting.SeedInfo.seed([97ABA0C8320EBCCB:E5E785C7836E0AB8]:0)
 at 
 org.apache.lucene.util.RamUsageEstimator$IdentityHashSet.add(RamUsageEstimator.java:674)

 Oh boy... what the hell is this? :)

 D.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5973) Pluggable Ranking Collectors and Merge Strategies

2014-05-27 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5973:
-

Attachment: SOLR-5973.patch

A new patch. Solves issues that were found while working on SOLR-6088.

 Pluggable Ranking Collectors and Merge Strategies
 -

 Key: SOLR-5973
 URL: https://issues.apache.org/jira/browse/SOLR-5973
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.9

 Attachments: SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch


 This ticket introduces a new RankQuery and MergeStrategy to Solr. By 
 extending the RankQuery class, and implementing it's interface, you can 
 specify a custom ranking collector (TopDocsCollector) and distributed merge 
 strategy for a Solr query. 
 Sample syntax:
 {code}
 q={!customRank subquery=*:* param1=a param2=b}wt=jsonindent=true
 {code}
 In the sample above the param: {code}q={!customRank  subquery=*:* param1=a 
 param2=b}{code} points to a QParserPlugin that returns a Query that extends 
 RankQuery.  The RankQuery defines the custom ranking and merge strategy for 
 it's  subquery.
 The RankQuery impl will have to do several things:
 1) Implement the RankQuery interface.
 2) Wrap the subquery and proxy all calls to the Query interface to the 
 subquery. Using local params syntax the subquery can be any valid Solr query. 
 The custom QParserPlugin is responsible for parsing the subquery.
 3)  Implement hashCode() and equals() so the queryResultCache works properly 
 with subquery and custom ranking algorithm. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5973) Pluggable Ranking Collectors and Merge Strategies

2014-05-27 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5973:
-

Description: 
This ticket introduces a new RankQuery and MergeStrategy to Solr. By extending 
the RankQuery class, and implementing it's interface, you can specify a custom 
ranking collector (TopDocsCollector) and distributed merge strategy for a Solr 
query. 



Sample syntax:

{code}
q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true
{code}
In the sample above the param: {code}rq={!customRank  param1=a param2=b}{code} 
points to a QParserPlugin that returns a Query that extends RankQuery.  The 
RankQuery defines the custom ranking and merge strategy for the main query.

The RankQuery impl will have to do several things:

1) Implement the getTopDocsCollector() method to return a custom top docs 
ranking collector.
2) Implement the wrap() method. The QueryComponent calls the wrap() method to 
wrap the RankQuery around the main query. This design allows the RankQuery to 
manage Query caching issues and implement custom Query explanations if needed.
3)  Implement hashCode() and equals() so the queryResultCache works properly 
with main query and custom ranking algorithm.
 4) Optionally implement a custom MergeStrategy to handle the merging of 
distributed results from the shards.




  was:
This ticket introduces a new RankQuery and MergeStrategy to Solr. By extending 
the RankQuery class, and implementing it's interface, you can specify a custom 
ranking collector (TopDocsCollector) and distributed merge strategy for a Solr 
query. 



Sample syntax:

{code}
q={!customRank subquery=*:* param1=a param2=b}wt=jsonindent=true
{code}
In the sample above the param: {code}q={!customRank  subquery=*:* param1=a 
param2=b}{code} points to a QParserPlugin that returns a Query that extends 
RankQuery.  The RankQuery defines the custom ranking and merge strategy for 
it's  subquery.

The RankQuery impl will have to do several things:

1) Implement the RankQuery interface.
2) Wrap the subquery and proxy all calls to the Query interface to the 
subquery. Using local params syntax the subquery can be any valid Solr query. 
The custom QParserPlugin is responsible for parsing the subquery.
3)  Implement hashCode() and equals() so the queryResultCache works properly 
with subquery and custom ranking algorithm. 





 Pluggable Ranking Collectors and Merge Strategies
 -

 Key: SOLR-5973
 URL: https://issues.apache.org/jira/browse/SOLR-5973
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.9

 Attachments: SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch


 This ticket introduces a new RankQuery and MergeStrategy to Solr. By 
 extending the RankQuery class, and implementing it's interface, you can 
 specify a custom ranking collector (TopDocsCollector) and distributed merge 
 strategy for a Solr query. 
 Sample syntax:
 {code}
 q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true
 {code}
 In the sample above the param: {code}rq={!customRank  param1=a 
 param2=b}{code} points to a QParserPlugin that returns a Query that extends 
 RankQuery.  The RankQuery defines the custom ranking and merge strategy for 
 the main query.
 The RankQuery impl will have to do several things:
 1) Implement the getTopDocsCollector() method to return a custom top docs 
 ranking collector.
 2) Implement the wrap() method. The QueryComponent calls the wrap() method to 
 wrap the RankQuery around the main query. This design allows the RankQuery to 
 manage Query caching issues and implement custom Query explanations if needed.
 3)  Implement hashCode() and equals() so the queryResultCache works properly 
 with main query and custom ranking algorithm.
  4) Optionally implement a custom MergeStrategy to handle the merging of 
 distributed results from the shards.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5973) Pluggable Ranking Collectors and Merge Strategies

2014-05-27 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5973:
-

Description: 
This ticket introduces a new RankQuery and MergeStrategy to Solr. By extending 
the RankQuery class, and implementing it's interface, you can specify a custom 
ranking collector (TopDocsCollector) and distributed merge strategy for a Solr 
query. 



Sample syntax:

{code}
q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true
{code}
In the sample above the new rq (rank query) param: {code}rq={!customRank  
param1=a param2=b}{code} points to a QParserPlugin that returns a Query that 
extends RankQuery.  The RankQuery defines the custom ranking and merge strategy 
for the main query.

The RankQuery impl will have to do several things:

1) Implement the getTopDocsCollector() method to return a custom top docs 
ranking collector.
2) Implement the wrap() method. The QueryComponent calls the wrap() method to 
wrap the RankQuery around the main query. This design allows the RankQuery to 
manage Query caching issues and implement custom Query explanations if needed.
3)  Implement hashCode() and equals() so the queryResultCache works properly 
with main query and custom ranking algorithm.
 4) Optionally implement a custom MergeStrategy to handle the merging of 
distributed results from the shards.




  was:
This ticket introduces a new RankQuery and MergeStrategy to Solr. By extending 
the RankQuery class, and implementing it's interface, you can specify a custom 
ranking collector (TopDocsCollector) and distributed merge strategy for a Solr 
query. 



Sample syntax:

{code}
q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true
{code}
In the sample above the param: {code}rq={!customRank  param1=a param2=b}{code} 
points to a QParserPlugin that returns a Query that extends RankQuery.  The 
RankQuery defines the custom ranking and merge strategy for the main query.

The RankQuery impl will have to do several things:

1) Implement the getTopDocsCollector() method to return a custom top docs 
ranking collector.
2) Implement the wrap() method. The QueryComponent calls the wrap() method to 
wrap the RankQuery around the main query. This design allows the RankQuery to 
manage Query caching issues and implement custom Query explanations if needed.
3)  Implement hashCode() and equals() so the queryResultCache works properly 
with main query and custom ranking algorithm.
 4) Optionally implement a custom MergeStrategy to handle the merging of 
distributed results from the shards.





 Pluggable Ranking Collectors and Merge Strategies
 -

 Key: SOLR-5973
 URL: https://issues.apache.org/jira/browse/SOLR-5973
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.9

 Attachments: SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch


 This ticket introduces a new RankQuery and MergeStrategy to Solr. By 
 extending the RankQuery class, and implementing it's interface, you can 
 specify a custom ranking collector (TopDocsCollector) and distributed merge 
 strategy for a Solr query. 
 Sample syntax:
 {code}
 q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true
 {code}
 In the sample above the new rq (rank query) param: {code}rq={!customRank  
 param1=a param2=b}{code} points to a QParserPlugin that returns a Query that 
 extends RankQuery.  The RankQuery defines the custom ranking and merge 
 strategy for the main query.
 The RankQuery impl will have to do several things:
 1) Implement the getTopDocsCollector() method to return a custom top docs 
 ranking collector.
 2) Implement the wrap() method. The QueryComponent calls the wrap() method to 
 wrap the RankQuery around the main query. This design allows the RankQuery to 
 manage Query caching issues and implement custom Query explanations if needed.
 3)  Implement hashCode() and equals() so the queryResultCache works properly 
 with main query and custom ranking algorithm.
  4) Optionally implement a custom MergeStrategy to handle the merging of 
 distributed results from the shards.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5973) Pluggable Ranking Collectors and Merge Strategies

2014-05-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009648#comment-14009648
 ] 

ASF subversion and git services commented on SOLR-5973:
---

Commit 1597775 from [~joel.bernstein] in branch 'dev/trunk'
[ https://svn.apache.org/r1597775 ]

SOLR-5973: Pluggable Ranking Collectors and Merge Strategies

 Pluggable Ranking Collectors and Merge Strategies
 -

 Key: SOLR-5973
 URL: https://issues.apache.org/jira/browse/SOLR-5973
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.9

 Attachments: SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch


 This ticket introduces a new RankQuery and MergeStrategy to Solr. By 
 extending the RankQuery class, and implementing it's interface, you can 
 specify a custom ranking collector (TopDocsCollector) and distributed merge 
 strategy for a Solr query. 
 Sample syntax:
 {code}
 q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true
 {code}
 In the sample above the new rq (rank query) param: {code}rq={!customRank  
 param1=a param2=b}{code} points to a QParserPlugin that returns a Query that 
 extends RankQuery.  The RankQuery defines the custom ranking and merge 
 strategy for the main query.
 The RankQuery impl will have to do several things:
 1) Implement the getTopDocsCollector() method to return a custom top docs 
 ranking collector.
 2) Implement the wrap() method. The QueryComponent calls the wrap() method to 
 wrap the RankQuery around the main query. This design allows the RankQuery to 
 manage Query caching issues and implement custom Query explanations if needed.
 3)  Implement hashCode() and equals() so the queryResultCache works properly 
 with main query and custom ranking algorithm.
  4) Optionally implement a custom MergeStrategy to handle the merging of 
 distributed results from the shards.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)

2014-05-27 Thread Eyal Zaidman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009681#comment-14009681
 ] 

Eyal Zaidman commented on SOLR-6113:


I'm a little confused by that behavior, assuming I understand the technical 
details. The real world scenario uf is trying to address is disallowing or 
restricting search in some fields, so for example if I wanted to implement a 
permissions scheme, I could tell it -restrictedField and it would not search 
there. By treating that search as a literal (presumably because we can't detect 
whether the user meant a fielded search or a Swedish term, exactly matching a 
SOLR field name) we're preferring the less common to rather esoteric (IMO) 
scenario.
Adding to that Liram's comment about the relation to SOLR-2649, the default 
operator behavior could make this even worse, where instead of OR you get an 
AND behavior, and all searches fail due to forcing a non-existent literal match.

Do you think it would make sense to add functionality that removes that part of 
the search query instead of escaping it ? We could of course make some flag for 
preserving the old behavior in case someone finds it useful.

Could you point us in the right direction to do it if so ? We'd be happy to 
attempt a patch

 Edismax doesn't parse well the query uf (User Fields)
 -

 Key: SOLR-6113
 URL: https://issues.apache.org/jira/browse/SOLR-6113
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Reporter: Liram Vardi

 It seems that Edismax User Fields feature does not behave as expected.
 For instance, assuming the following query:
 _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_
 The parsed query (taken from query debug info) is:
 _+((id:b* (text:user) (text:anna collins))~1)_
 I expect that because user was filtered out in uf (User fields), the 
 parsed query should not contain the user search part.
 In another words, the parsed query should look simply like this:  _+id:b*_
 This issue is affected by a the patch on issue SOLR-2649: When changing the 
 default OP of Edismax to AND, the query results change.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5973) Pluggable Ranking Collectors and Merge Strategies

2014-05-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009697#comment-14009697
 ] 

ASF subversion and git services commented on SOLR-5973:
---

Commit 1597796 from [~joel.bernstein] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1597796 ]

SOLR-5973: Pluggable Ranking Collectors and Merge Strategies

 Pluggable Ranking Collectors and Merge Strategies
 -

 Key: SOLR-5973
 URL: https://issues.apache.org/jira/browse/SOLR-5973
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.9

 Attachments: SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch


 This ticket introduces a new RankQuery and MergeStrategy to Solr. By 
 extending the RankQuery class, and implementing it's interface, you can 
 specify a custom ranking collector (TopDocsCollector) and distributed merge 
 strategy for a Solr query. 
 Sample syntax:
 {code}
 q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true
 {code}
 In the sample above the new rq (rank query) param: {code}rq={!customRank  
 param1=a param2=b}{code} points to a QParserPlugin that returns a Query that 
 extends RankQuery.  The RankQuery defines the custom ranking and merge 
 strategy for the main query.
 The RankQuery impl will have to do several things:
 1) Implement the getTopDocsCollector() method to return a custom top docs 
 ranking collector.
 2) Implement the wrap() method. The QueryComponent calls the wrap() method to 
 wrap the RankQuery around the main query. This design allows the RankQuery to 
 manage Query caching issues and implement custom Query explanations if needed.
 3)  Implement hashCode() and equals() so the queryResultCache works properly 
 with main query and custom ranking algorithm.
  4) Optionally implement a custom MergeStrategy to handle the merging of 
 distributed results from the shards.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5708) Remove IndexWriterConfig.clone

2014-05-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009722#comment-14009722
 ] 

Shai Erera commented on LUCENE-5708:


I think the way you fixed some tests that used clone is incorrect. You should 
at least call {{newIndexWriterConfig(random)}} w/ the same random and seed, so 
the exact IWC is created each time. At least, that's what these tests now rely 
on, even if they don't break. Otherwise, they just create a random IWC each 
time they open a writer, which is not the intention I believe.

 Remove IndexWriterConfig.clone
 --

 Key: LUCENE-5708
 URL: https://issues.apache.org/jira/browse/LUCENE-5708
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5708.patch, LUCENE-5708.patch


 We originally added this clone to allow a single IWC to be re-used against 
 more than one IndexWriter, but I think this is a mis-feature: it adds 
 complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I 
 think it's buggy today.
 I think we should just disallow sharing: you must make a new IWC for a new 
 IndexWriter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)

2014-05-27 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009733#comment-14009733
 ] 

Jack Krupansky commented on SOLR-6113:
--

Better doc for the intended behavior would help, at least a little. At least we 
could point people to a clear description of what actually happens.


 Edismax doesn't parse well the query uf (User Fields)
 -

 Key: SOLR-6113
 URL: https://issues.apache.org/jira/browse/SOLR-6113
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Reporter: Liram Vardi

 It seems that Edismax User Fields feature does not behave as expected.
 For instance, assuming the following query:
 _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_
 The parsed query (taken from query debug info) is:
 _+((id:b* (text:user) (text:anna collins))~1)_
 I expect that because user was filtered out in uf (User fields), the 
 parsed query should not contain the user search part.
 In another words, the parsed query should look simply like this:  _+id:b*_
 This issue is affected by a the patch on issue SOLR-2649: When changing the 
 default OP of Edismax to AND, the query results change.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5708) Remove IndexWriterConfig.clone

2014-05-27 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009824#comment-14009824
 ] 

Michael McCandless commented on LUCENE-5708:


Hmm which tests rely on using the same IWC?  I thought I was improving the 
tests by switching up the config...

 Remove IndexWriterConfig.clone
 --

 Key: LUCENE-5708
 URL: https://issues.apache.org/jira/browse/LUCENE-5708
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5708.patch, LUCENE-5708.patch


 We originally added this clone to allow a single IWC to be re-used against 
 more than one IndexWriter, but I think this is a mis-feature: it adds 
 complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I 
 think it's buggy today.
 I think we should just disallow sharing: you must make a new IWC for a new 
 IndexWriter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5708) Remove IndexWriterConfig.clone

2014-05-27 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009930#comment-14009930
 ] 

Adrien Grand commented on LUCENE-5708:
--

I don't know about these tests that expect the same config but I'm +1 in 
general to remove all that cloning. It looks to me that we should be able to 
make some fields final now that we don't have a clone method anymore (eg. 
MergePolicy.writer)?

 Remove IndexWriterConfig.clone
 --

 Key: LUCENE-5708
 URL: https://issues.apache.org/jira/browse/LUCENE-5708
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5708.patch, LUCENE-5708.patch


 We originally added this clone to allow a single IWC to be re-used against 
 more than one IndexWriter, but I think this is a mis-feature: it adds 
 complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I 
 think it's buggy today.
 I think we should just disallow sharing: you must make a new IWC for a new 
 IndexWriter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-6091) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations

2014-05-27 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-6091:


Assignee: Noble Paul  (was: Shalin Shekhar Mangar)

 Race condition in prioritizeOverseerNodes can trigger extra QUIT operations
 ---

 Key: SOLR-6091
 URL: https://issues.apache.org/jira/browse/SOLR-6091
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7, 4.8
Reporter: Shalin Shekhar Mangar
Assignee: Noble Paul
 Fix For: 4.9, 5.0

 Attachments: SOLR-6091.patch


 When using the overseer roles feature, there is a possibility of more than 
 one thread executing the prioritizeOverseerNodes method and extra QUIT 
 commands being inserted into the overseer queue.
 At a minimum, the prioritizeOverseerNodes should be synchronized to avoid a 
 race condition.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-6095) SolrCloud cluster can end up without an overseer

2014-05-27 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-6095:


Assignee: Noble Paul

 SolrCloud cluster can end up without an overseer
 

 Key: SOLR-6095
 URL: https://issues.apache.org/jira/browse/SOLR-6095
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.8
Reporter: Shalin Shekhar Mangar
Assignee: Noble Paul
 Fix For: 4.9, 5.0


 We have a large cluster running on ec2 which occasionally ends up without an 
 overseer after a rolling restart. We always restart our overseer nodes at the 
 very last otherwise we end up with a large number of shards that can't 
 recover properly.
 This cluster is running a custom branch forked from 4.8 and has SOLR-5473, 
 SOLR-5495 and SOLR-5468 applied. We have a large number of small collections 
 (120 collections each with approx 5M docs) on 16 Solr nodes. We are also 
 using the overseer roles feature to designate two specified nodes as 
 overseers. However, I think the problem that we're seeing is not specific to 
 the overseer roles feature.
 As soon as the overseer was shutdown, we saw the following on the node which 
 was next in line to become the overseer:
 {code}
 2014-05-20 09:55:39,261 [main-EventThread] INFO  solr.cloud.ElectionContext  
 - I am going to be the leader ec2-xx.compute-1.amazonaws.com:8987_solr
 2014-05-20 09:55:39,265 [main-EventThread] WARN  solr.cloud.LeaderElector  - 
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /overseer_elect/leader
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
   at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
   at 
 org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:432)
   at 
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
   at 
 org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:429)
   at 
 org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:386)
   at 
 org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:373)
   at 
 org.apache.solr.cloud.OverseerElectionContext.runLeaderProcess(ElectionContext.java:551)
   at 
 org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:142)
   at 
 org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:110)
   at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
   at 
 org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:303)
   at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 {code}
 When the overseer leader node is gracefully shutdown, we get the following in 
 the logs:
 {code}
 2014-05-20 09:55:39,254 [Thread-63] ERROR solr.cloud.Overseer  - Exception in 
 Overseer main queue loop
 org.apache.solr.common.SolrException: Could not load collection from ZK:sm12
   at 
 org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:778)
   at 
 org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:553)
   at 
 org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:246)
   at 
 org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:237)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.InterruptedException
   at java.lang.Object.wait(Native Method)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
   at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1040)
   at 
 org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:226)
   at 
 org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:223)
   at 
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
   at 
 org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:223)
   at 
 org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:767)
   ... 4 more
 2014-05-20 09:55:39,254 [Thread-63] INFO  solr.cloud.Overseer  - Overseer 
 Loop exiting : ec2-xx.compute-1.amazonaws.com:8986_solr
 2014-05-20 09:55:39,256 [main-EventThread] WARN  common.cloud.ZkStateReader  
 - ZooKeeper watch triggered, but Solr cannot talk to ZK
 2014-05-20 09:55:39,259 [ShutdownMonitor] INFO  server.handler.ContextHandler 
  - stopped 
 

[jira] [Commented] (LUCENE-5708) Remove IndexWriterConfig.clone

2014-05-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009988#comment-14009988
 ] 

Shai Erera commented on LUCENE-5708:


bq. Hmm which tests rely on using the same IWC?

Hmm ... I don't remember. All I remember is that while I worked on preventing 
sharing IWC between writers (LUCENE-4876), there were a bunch of tests that 
reused the IWC. I fixed them by simply cloning it, but I admit I didn't check 
if initializing a new IWC each time serves their purpose. I just assume that if 
so many tests did that, there ought to be a reason beyond just convenience, but 
I could be wrong.

What I'm worried is that by not cloning Jenkins will trip (which is good!), or 
worse - that those tests will stop asserting what they asserted before. So I 
just wanted to point that out. If we're ready to take the risk, I'm fine with 
it, because eventually we're discussing tests here .. there's nothing 
functionally missing from an app's perspective.

 Remove IndexWriterConfig.clone
 --

 Key: LUCENE-5708
 URL: https://issues.apache.org/jira/browse/LUCENE-5708
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5708.patch, LUCENE-5708.patch


 We originally added this clone to allow a single IWC to be re-used against 
 more than one IndexWriter, but I think this is a mis-feature: it adds 
 complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I 
 think it's buggy today.
 I think we should just disallow sharing: you must make a new IWC for a new 
 IndexWriter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5708) Remove IndexWriterConfig.clone

2014-05-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009988#comment-14009988
 ] 

Shai Erera edited comment on LUCENE-5708 at 5/27/14 5:56 PM:
-

bq. Hmm which tests rely on using the same IWC?

Hmm ... I don't remember. All I remember is that while I worked on preventing 
sharing IWC between writers (LUCENE-4876), there were a bunch of tests that 
reused the IWC. I fixed them by simply cloning it, but I admit I didn't check 
if initializing a new IWC each time serves their purpose. I just assume that if 
so many tests did that, there ought to be a reason beyond just convenience, but 
I could be wrong.

What I'm worried is that by not cloning Jenkins will trip (which is good!), or 
worse - that those tests will stop asserting what they asserted before. So I 
just wanted to point that out. If we're ready to take the risk, I'm fine with 
it, because eventually we're discussing tests here .. there's nothing 
functionally impossible from an app's perspective.


was (Author: shaie):
bq. Hmm which tests rely on using the same IWC?

Hmm ... I don't remember. All I remember is that while I worked on preventing 
sharing IWC between writers (LUCENE-4876), there were a bunch of tests that 
reused the IWC. I fixed them by simply cloning it, but I admit I didn't check 
if initializing a new IWC each time serves their purpose. I just assume that if 
so many tests did that, there ought to be a reason beyond just convenience, but 
I could be wrong.

What I'm worried is that by not cloning Jenkins will trip (which is good!), or 
worse - that those tests will stop asserting what they asserted before. So I 
just wanted to point that out. If we're ready to take the risk, I'm fine with 
it, because eventually we're discussing tests here .. there's nothing 
functionally missing from an app's perspective.

 Remove IndexWriterConfig.clone
 --

 Key: LUCENE-5708
 URL: https://issues.apache.org/jira/browse/LUCENE-5708
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5708.patch, LUCENE-5708.patch


 We originally added this clone to allow a single IWC to be re-used against 
 more than one IndexWriter, but I think this is a mis-feature: it adds 
 complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I 
 think it's buggy today.
 I think we should just disallow sharing: you must make a new IWC for a new 
 IndexWriter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6114) cron for elgg_solr reindex

2014-05-27 Thread Jagdish Bairagi (JIRA)
Jagdish Bairagi created SOLR-6114:
-

 Summary: cron for elgg_solr reindex
 Key: SOLR-6114
 URL: https://issues.apache.org/jira/browse/SOLR-6114
 Project: Solr
  Issue Type: Bug
  Components: clients - php
Affects Versions: 4.8.1
 Environment: Elgg+elgg_solr plugin, Nginx with Centos
Reporter: Jagdish Bairagi
 Fix For: 4.8.1


Hi,

we have setup solr on our server and added a elgg_solr plugin to our elgg 
site...and plugin is working fine with locally installed solr..and indexing 
manually

but i need a cron for indexing using elgg_solr plugin...

as i followed the doc.:- but can't see which cron need to be run..where is the 
cron. 
https://github.com/arckinteractive/elgg_solr

 Installation

Install to elgg mod directory as 'elgg_solr'.

Enable the plugin in the Admin page, move to position below the search plugin.

Configure Solr with the schema.xml included in the root directory of this 
plugin.

Enter and save the connection information on the plugin settings page.

Trigger a reindex from the plugin setting page.

Ensure hourly cron is configured and active


So please help me..i am looking for cron to setup..for autoreindex. what should 
be the cron...

Thank,
Jagdish



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-6114) cron for elgg_solr reindex

2014-05-27 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey closed SOLR-6114.
--

   Resolution: Invalid
Fix Version/s: (was: 4.8.1)

The elgg software package is not part of Solr.  This is a bug/feature tracker 
for Solr and other Apache projects.

Nobody here will have the required knowledge to help you.  You'll need to send 
a request to a support resource for elgg or their solr plugin.  It looks like 
you might be able to do this at the following location, after creating an 
account for their website:

http://community.elgg.org/groups/all?filter=support

Another possibility would be to open an issue against the solr plugin project 
asking them to include the required crontab entry in their README, which you 
could do at the following URL.  You probably need a github account to do this:

https://github.com/arckinteractive/elgg_solr/issues


 cron for elgg_solr reindex
 --

 Key: SOLR-6114
 URL: https://issues.apache.org/jira/browse/SOLR-6114
 Project: Solr
  Issue Type: Bug
  Components: clients - php
Affects Versions: 4.8.1
 Environment: Elgg+elgg_solr plugin, Nginx with Centos
Reporter: Jagdish Bairagi

 Hi,
 we have setup solr on our server and added a elgg_solr plugin to our elgg 
 site...and plugin is working fine with locally installed solr..and indexing 
 manually
 but i need a cron for indexing using elgg_solr plugin...
 as i followed the doc.:- but can't see which cron need to be run..where is 
 the cron. 
 https://github.com/arckinteractive/elgg_solr
  Installation
 Install to elgg mod directory as 'elgg_solr'.
 Enable the plugin in the Admin page, move to position below the search plugin.
 Configure Solr with the schema.xml included in the root directory of this 
 plugin.
 Enter and save the connection information on the plugin settings page.
 Trigger a reindex from the plugin setting page.
 Ensure hourly cron is configured and active
 So please help me..i am looking for cron to setup..for autoreindex. what 
 should be the cron...
 Thank,
 Jagdish



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6062) Phrase queries are created for each field supplied through edismax's pf, pf2 and pf3 parameters (rather them being combined in a single dismax query)

2014-05-27 Thread Michael Dodsworth (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010368#comment-14010368
 ] 

Michael Dodsworth commented on SOLR-6062:
-

adding [~jdyer] [~janhoy], as you were involved in 
https://issues.apache.org/jira/browse/SOLR-2058

 Phrase queries are created for each field supplied through edismax's pf, pf2 
 and pf3 parameters (rather them being combined in a single dismax query)
 -

 Key: SOLR-6062
 URL: https://issues.apache.org/jira/browse/SOLR-6062
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0
Reporter: Michael Dodsworth
Priority: Minor
 Attachments: combined-phrased-dismax.patch


 https://issues.apache.org/jira/browse/SOLR-2058 subtly changed how phrase 
 queries, created through the pf, pf2 and pf3 parameters, are merged into the 
 main user query.
 For the query: 'term1 term2' with pf2:[field1, field2, field3] we now get 
 (omitting the non phrase query section for clarity):
 {code:java}
 main query
 DisjunctionMaxQuery((field1:term1 term2^1.0)~0.1)
 DisjunctionMaxQuery((field2:term1 term2^1.0)~0.1)
 DisjunctionMaxQuery((field3:term1 term2^1.0)~0.1)
 {code}
 Prior to this change, we had:
 {code:java}
 main query 
 DisjunctionMaxQuery((field1:term1 term2^1.0 | field2:term1 term2^1.0 | 
 field3:term1 term2^1.0)~0.1)
 {code}
 The upshot being that if the phrase query term1 term2 appears in multiple 
 fields, it will get a significant boost over the previous implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)

2014-05-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010443#comment-14010443
 ] 

Jan Høydahl commented on SOLR-6113:
---

Suggestion for improvements welcome. In my opinion, simply discarding the whole 
term is also very confusing. Imagine {{uf=-titleq=title:java language}} - as 
we're now you will require the three words title, java, language. If we remove 
the first term we'll match all docs with language, clearly not the intention.

A perhaps better solution could be to search literally for the string 
title:java instead of breaking it into two?

Alternatively, disregard the field part, but let the value part stay and be 
subject to qf or df. However, that could be confusing too, if the user won't 
get any feedback whatsoever that his query term java was actually not 
restricted to title only.

 Edismax doesn't parse well the query uf (User Fields)
 -

 Key: SOLR-6113
 URL: https://issues.apache.org/jira/browse/SOLR-6113
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Reporter: Liram Vardi
  Labels: edismax

 It seems that Edismax User Fields feature does not behave as expected.
 For instance, assuming the following query:
 _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_
 The parsed query (taken from query debug info) is:
 _+((id:b* (text:user) (text:anna collins))~1)_
 I expect that because user was filtered out in uf (User fields), the 
 parsed query should not contain the user search part.
 In another words, the parsed query should look simply like this:  _+id:b*_
 This issue is affected by a the patch on issue SOLR-2649: When changing the 
 default OP of Edismax to AND, the query results change.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)

2014-05-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-6113:
--

Issue Type: Improvement  (was: Bug)

 Edismax doesn't parse well the query uf (User Fields)
 -

 Key: SOLR-6113
 URL: https://issues.apache.org/jira/browse/SOLR-6113
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Liram Vardi
  Labels: edismax

 It seems that Edismax User Fields feature does not behave as expected.
 For instance, assuming the following query:
 _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_
 The parsed query (taken from query debug info) is:
 _+((id:b* (text:user) (text:anna collins))~1)_
 I expect that because user was filtered out in uf (User fields), the 
 parsed query should not contain the user search part.
 In another words, the parsed query should look simply like this:  _+id:b*_
 This issue is affected by a the patch on issue SOLR-2649: When changing the 
 default OP of Edismax to AND, the query results change.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)

2014-05-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-6113:
--

Labels: edismax  (was: )

 Edismax doesn't parse well the query uf (User Fields)
 -

 Key: SOLR-6113
 URL: https://issues.apache.org/jira/browse/SOLR-6113
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Liram Vardi
Priority: Minor
  Labels: edismax

 It seems that Edismax User Fields feature does not behave as expected.
 For instance, assuming the following query:
 _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_
 The parsed query (taken from query debug info) is:
 _+((id:b* (text:user) (text:anna collins))~1)_
 I expect that because user was filtered out in uf (User fields), the 
 parsed query should not contain the user search part.
 In another words, the parsed query should look simply like this:  _+id:b*_
 This issue is affected by a the patch on issue SOLR-2649: When changing the 
 default OP of Edismax to AND, the query results change.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)

2014-05-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-6113:
--

Priority: Minor  (was: Major)

 Edismax doesn't parse well the query uf (User Fields)
 -

 Key: SOLR-6113
 URL: https://issues.apache.org/jira/browse/SOLR-6113
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Liram Vardi
Priority: Minor
  Labels: edismax

 It seems that Edismax User Fields feature does not behave as expected.
 For instance, assuming the following query:
 _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_
 The parsed query (taken from query debug info) is:
 _+((id:b* (text:user) (text:anna collins))~1)_
 I expect that because user was filtered out in uf (User fields), the 
 parsed query should not contain the user search part.
 In another words, the parsed query should look simply like this:  _+id:b*_
 This issue is affected by a the patch on issue SOLR-2649: When changing the 
 default OP of Edismax to AND, the query results change.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Schemaless mode and solrconfig.xml

2014-05-27 Thread Erick Erickson
How do all the components that define a text field for various
purposes play nice in schemaless mode?

The df entry is commented out of the /select handler which at least
lets us get started.

But there are a variety of places where text is specified, the
/browse handler, spell checking, etc. What ideas exist or are already
under way for making it easier to make Solr run OOB in schemaless with
these bells and whistles? Or will it just be required that people edit
solrconfig.xml to reflect fields that actually get created via
schemaless?

Or is the best solution just to take everything out of the
solrconfig.xml file that appears in the example-schemaless
configuration except, say the select handler?

Or pre-define a text field in the schemaless schema.xml file to handle these?

I'm not even sure we _could_ make things like the spell checker play
nice with schemaless. For the other handlers we can specify df= on
the URL once we know there's a field available, but that's tricky for
spell checking...

And I think browse is just totally out of the question since it's
coded up for the default schema. Pull that too?

Thoughts?
Erick

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Schemaless mode and solrconfig.xml

2014-05-27 Thread Gregory Chanan
Long term, it seems like the solution is to have a SolrConfig API that is
powerful enough to define things like df at runtime.  Definitely worth
discussing what we should do until then, though.

Or pre-define a text field in the schemaless schema.xml file to handle
 these?


I'd lean towards not defining any dfs in the schemaless solrconfigs over
this solution.  I'd imagine if we define text in the schemaless
schema.xml, a user would do the following:
- index some data
- query and get no results and be confused

Alternatively, if we don't define any dfs they pattern would be:
- index some data
- query and get error about field not specified or df not defined
from there the user would at least have a starting point and decide to
specify the df in the query, specify the field, or edit the solrconfig.xml
and restart the service.  That seems like it would result in less
unexpected behavior and fewer user questions.

Note, this means we should probably get rid of the df for /query in the
schemaless example.



 I'm not even sure we _could_ make things like the spell checker play
 nice with schemaless. For the other handlers we can specify df= on
 the URL once we know there's a field available, but that's tricky for
 spell checking...

 And I think browse is just totally out of the question since it's
 coded up for the default schema. Pull that too?


+1 on pulling both of those.


 Thoughts?
 Erick

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (SOLR-5973) Pluggable Ranking Collectors and Merge Strategies

2014-05-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010695#comment-14010695
 ] 

ASF subversion and git services commented on SOLR-5973:
---

Commit 1597921 from [~joel.bernstein] in branch 'dev/trunk'
[ https://svn.apache.org/r1597921 ]

SOLR-5973: Pluggable Ranking Collectors and Merge Strategies

 Pluggable Ranking Collectors and Merge Strategies
 -

 Key: SOLR-5973
 URL: https://issues.apache.org/jira/browse/SOLR-5973
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.9

 Attachments: SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch


 This ticket introduces a new RankQuery and MergeStrategy to Solr. By 
 extending the RankQuery class, and implementing it's interface, you can 
 specify a custom ranking collector (TopDocsCollector) and distributed merge 
 strategy for a Solr query. 
 Sample syntax:
 {code}
 q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true
 {code}
 In the sample above the new rq (rank query) param: {code}rq={!customRank  
 param1=a param2=b}{code} points to a QParserPlugin that returns a Query that 
 extends RankQuery.  The RankQuery defines the custom ranking and merge 
 strategy for the main query.
 The RankQuery impl will have to do several things:
 1) Implement the getTopDocsCollector() method to return a custom top docs 
 ranking collector.
 2) Implement the wrap() method. The QueryComponent calls the wrap() method to 
 wrap the RankQuery around the main query. This design allows the RankQuery to 
 manage Query caching issues and implement custom Query explanations if needed.
 3)  Implement hashCode() and equals() so the queryResultCache works properly 
 with main query and custom ranking algorithm.
  4) Optionally implement a custom MergeStrategy to handle the merging of 
 distributed results from the shards.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5973) Pluggable Ranking Collectors and Merge Strategies

2014-05-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010733#comment-14010733
 ] 

ASF subversion and git services commented on SOLR-5973:
---

Commit 1597923 from [~joel.bernstein] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1597923 ]

SOLR-5973: Pluggable Ranking Collectors and Merge Strategies

 Pluggable Ranking Collectors and Merge Strategies
 -

 Key: SOLR-5973
 URL: https://issues.apache.org/jira/browse/SOLR-5973
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.9

 Attachments: SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, 
 SOLR-5973.patch


 This ticket introduces a new RankQuery and MergeStrategy to Solr. By 
 extending the RankQuery class, and implementing it's interface, you can 
 specify a custom ranking collector (TopDocsCollector) and distributed merge 
 strategy for a Solr query. 
 Sample syntax:
 {code}
 q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true
 {code}
 In the sample above the new rq (rank query) param: {code}rq={!customRank  
 param1=a param2=b}{code} points to a QParserPlugin that returns a Query that 
 extends RankQuery.  The RankQuery defines the custom ranking and merge 
 strategy for the main query.
 The RankQuery impl will have to do several things:
 1) Implement the getTopDocsCollector() method to return a custom top docs 
 ranking collector.
 2) Implement the wrap() method. The QueryComponent calls the wrap() method to 
 wrap the RankQuery around the main query. This design allows the RankQuery to 
 manage Query caching issues and implement custom Query explanations if needed.
 3)  Implement hashCode() and equals() so the queryResultCache works properly 
 with main query and custom ranking algorithm.
  4) Optionally implement a custom MergeStrategy to handle the merging of 
 distributed results from the shards.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org