date:20080520

Extend SimplePost with RecurseDirectories, threads, document encoding , number 
of docs per commit
-

 Key: SOLR-579
 URL: https://issues.apache.org/jira/browse/SOLR-579
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.3
 Environment: Applies to all platforms
Reporter: Patrick Debois
Priority: Minor
 Fix For: 1.3


-When specifying a directory, simplepost should read also the contents of a  
directory

New options for the commandline (some only usefull in DATAMODE= files)
-RECURSEDIRS
Recursive read of directories as an option, this is usefull for 
directories with a lot of files where the commandline expansion fails and xargs 
is too slow
-DOCENCODING (default = system encoding or UTF-8) 
For non utf-8 clients , simplepost should include a way to set the 
encoding of the documents posted
-THREADSIZE (default =1 ) 
For large volume posts, a threading pool makes sense , using JDK 1.5 
Threadpool model
-DOCSPERCOMMIT (default = 1)
Number of documents after which a commit is done, instead of only at 
the end

Note: not to break the existing behaviour of the existing SimplePost tool 
(post.sh) might be used in scripts 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-580) Filte Query: Retrieve all docs with facets missing

Filte Query: Retrieve  all docs with facets missing
---

 Key: SOLR-580
 URL: https://issues.apache.org/jira/browse/SOLR-580
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Patrick Debois
Priority: Minor


Consider this list

facetA - 10
facetB - 20
facets missing  - 30

For facetA and facetB it is easy to select the correct fq=FACET:value . But to 
be able to see the document that have missing facets one needs to specifiy a 
NOT fq= for every value in the facet.
Therefore a kind of short hand would be usefull to select all documents that 
have a facet missing. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-581) Typo fillInteristingTermsFromMLTQuery

Typo fillInteristingTermsFromMLTQuery
-

 Key: SOLR-581
 URL: https://issues.apache.org/jira/browse/SOLR-581
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.3
Reporter: Patrick Debois
Priority: Trivial


There is a typo in  MoreLikeThisHandler.java

fillInteristingTermsFromMLTQuery

should read

fillInterestingTermsFromMLTQuery



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-582) Field Aliasing

Field Aliasing
--

 Key: SOLR-582
 URL: https://issues.apache.org/jira/browse/SOLR-582
 Project: Solr
  Issue Type: New Feature
Reporter: Patrick Debois
Priority: Minor


XML that are indexed are often using meaningfull, fullblown names for their 
fields.

For powersearching shorthand for these terms would come in handy. 
This would also help for hard to remember values where one could specify 
multiple names for the same field.
Also for multi lingual queries this would be interesting.

Is guess there should be a config file that is read by the queryparser, 
substibuting terms to their canonical values.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-564) Realtime search in Solr

2008-05-20 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598308#action_12598308
]

Jason Rutherglen commented on SOLR-564:
---

After review, function queries ValueSource should be fine because the values
are not loaded from the field cache until the query is executed in the
sub-searcher. Sort is not mentioned because it is handled in the sub-searcher.
LUCENE-831 is a good step, however, if used for the top level reader in a
realtime system, large arrays will constantly be created for the top level
reader after every transaction. This is why the work, meaning the query and
the results, should be performed in the sub-searcher and then merged.

SimpleFacets.getFieldCacheCounts should be placed in the SolrIndexSearcher in
order for it to be operable which will be placed in SOLR-567.

The Ocean Solr code patch will be attached to this issue.

Realtime search in Solr
---

Key: SOLR-564
URL: https://issues.apache.org/jira/browse/SOLR-564
Project: Solr
Issue Type: New Feature
Components: replication, search
Affects Versions: 1.3
Reporter: Jason Rutherglen

Before when I looked at this, the changes required to make Solr realtime
would seem to break the rest of Solr. Is this still the case? In project
Ocean http://code.google.com/p/oceansearch/ there is a realtime core however
integrating into Solr has looked like a redesign of the guts of Solr.
- Support for replication per update to transaction log
- Custom realtime index creation
- Filter and facet merging
- Custom IndexSearcher that ties into realtime subsystem
- Custom SolrCore that ties into realtime subsystem
Is there a way to plug into these low level Solr functions without a massive
redesign? A key area of concern is the doclist caching which is not used in
realtime search because after every update the doclists are no longer valid.
The doclist caching and handling is default in SolrCore. Ocean relies on a
custom threaded MultiSearcher rather than a single IndexSearcher is a
difficulty. DirectUpdateHandler2 works directly on IndexWriter is
problematic.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-567) SolrCore Pluggable

2008-05-20 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-567:
--

Attachment: solr-567.patch

solr-567.patch

Moved SimpleFacets.getFieldCacheCounts to SolrIndexSearcher to allow an 
alternate SolrCore to use a different implementation due to direct top level 
field cache access.

 SolrCore Pluggable
 --

 Key: SOLR-567
 URL: https://issues.apache.org/jira/browse/SOLR-567
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: Jason Rutherglen
 Attachments: solr-567.patch, solr-567.patch


 SolrCore needs to be an abstract class with the existing functionality in a 
 subclass.  SolrIndexSearcher the same.  It seems that most of the Searcher 
 methods in SolrIndexSearcher are not used.  The new abstract class need only 
 have the methods used by the other Solr classes.  This will allow other 
 indexing and search implementations to reuse the other parts of Solr.  Any 
 other classes that have functionality specific to the Solr implementation of 
 indexing and replication such as SolrConfig can be made abstract.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-580) Filte Query: Retrieve all docs with facets missing

2008-05-20 Thread Erik Hatcher (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher resolved SOLR-580.
---

Resolution: Invalid

You can actually constrain an fq on all documents that do _not_ have a value in 
a particular field using fq=-field:[* TO *]

 Filte Query: Retrieve  all docs with facets missing
 ---

 Key: SOLR-580
 URL: https://issues.apache.org/jira/browse/SOLR-580
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Patrick Debois
Priority: Minor

 Consider this list
 facetA - 10
 facetB - 20
 facets missing  - 30
 For facetA and facetB it is easy to select the correct fq=FACET:value . But 
 to be able to see the document that have missing facets one needs to specifiy 
 a NOT fq= for every value in the facet.
 Therefore a kind of short hand would be usefull to select all documents that 
 have a facet missing. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: how to add a new parameter to solr request

2008-05-20 Thread khirb7


hello every body
I want just to add this example to be more clear. I have this result from
solr.

result name=response numFound=7 start=0 maxScore=0.59129626
−
doc
str name=id1/str
str name=DocUrlhttp://www.sarkozy.fr/str
str name=date01/01/2008/str
/doc
−
doc
str name=id2/str
str name=DocUrlhttp://www.sarkozy.fr/str
str name=date31/01/2008/str
/doc
−
doc
str name=id3/str
str name=DocUrlhttp://www.sarkozy.fr/str
str name=date15/01/2008/str
/doc
 .
 . 
 .
/result

Note that it's the same field   DocUrl (http://www.sarkozy.fr) for the three
shown document above. I want to get in  the result something like that.

result name=response numFound=7 start=0 maxScore=0.59129626
−
doc
str name=id2/str
str name=DocUrlhttp://www.sarkozy.fr/str
str name=date31/01/2008/str

/doc


 .
 . 
 .
/result
keep the recent one.

How to deal with that. Thank you in advance.




-- 
View this message in context: 
http://www.nabble.com/how-to-add-a-new-parameter-to-solr-request-tp17338190p17344135.html
Sent from the Solr - Dev mailing list archive at Nabble.com.

[jira] Updated: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values


 [ 
https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-556:


Fix Version/s: 1.3

 Highlighting of multi-valued fields returns snippets which span multiple 
 different values
 -

 Key: SOLR-556
 URL: https://issues.apache.org/jira/browse/SOLR-556
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
 Fix For: 1.3

 Attachments: solr-highlight-multivalued-example.xml, 
 solr-highlight-multivalued.patch


 When highlighting multi-valued fields, the highlighter sometimes returns 
 snippets which span multiple values, e.g. with values foo and bar and 
 search term ba the highlighter will create the snippet fooemba/emr. 
 Furthermore it sometimes returns smaller snippets than it should, e.g. with 
 value foobar and search term oo it will create the snippet emoo/em 
 regardless of hl.fragsize.
 I have been unable to determine the real cause for this, or indeed what 
 actually goes on at all. To reproduce the problem, I've used the following 
 steps:
 * create an index with multi-valued fields, one document should have at least 
 3 values for these fields (in my case strings of length between 5 and 15 
 Japanese characters -- as far as I can tell plain old ASCII should produce 
 the same effect though)
 * search for part of a value in such a field with highlighting enabled, the 
 additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, 
 hl.mergeContiguous=true (changing the parameters does not seem to have any 
 effect on the result though)
 * highlighted snippets should show effects described above

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-536) Automatic binding of results to Beans (for solrj)


 [ 
https://issues.apache.org/jira/browse/SOLR-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-536:


Fix Version/s: (was: 1.3)

 Automatic binding of results to Beans (for solrj)
 -

 Key: SOLR-536
 URL: https://issues.apache.org/jira/browse/SOLR-536
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.3
Reporter: Noble Paul
Priority: Minor
 Attachments: SOLR-536.patch


 as we are using java5 .we can use annotations to bind SolrDocument to java 
 beans directly.
 This can make the usage of solrj a  bit simpler
 The QueryResponse class in solrj can have an extra method as follows
 public T ListT getResultBeans(ClassT klass)
 and the bean can have annotations as
 class MyBean{
 @Field(id) //name is optional
 String id;
 @Field(category)
 ListString categories
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-579) Extend SimplePost with RecurseDirectories, threads, document encoding , number of docs per commit


 [ 
https://issues.apache.org/jira/browse/SOLR-579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-579:


Fix Version/s: (was: 1.3)

 Extend SimplePost with RecurseDirectories, threads, document encoding , 
 number of docs per commit
 -

 Key: SOLR-579
 URL: https://issues.apache.org/jira/browse/SOLR-579
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.3
 Environment: Applies to all platforms
Reporter: Patrick Debois
Priority: Minor
   Original Estimate: 72h
  Remaining Estimate: 72h

 -When specifying a directory, simplepost should read also the contents of a  
 directory
 New options for the commandline (some only usefull in DATAMODE= files)
 -RECURSEDIRS
 Recursive read of directories as an option, this is usefull for 
 directories with a lot of files where the commandline expansion fails and 
 xargs is too slow
 -DOCENCODING (default = system encoding or UTF-8) 
 For non utf-8 clients , simplepost should include a way to set the 
 encoding of the documents posted
 -THREADSIZE (default =1 ) 
 For large volume posts, a threading pool makes sense , using JDK 1.5 
 Threadpool model
 -DOCSPERCOMMIT (default = 1)
 Number of documents after which a commit is done, instead of only at 
 the end
 Note: not to break the existing behaviour of the existing SimplePost tool 
 (post.sh) might be used in scripts 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-383) Add support for globalization/culture management


 [ 
https://issues.apache.org/jira/browse/SOLR-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas resolved SOLR-383.
-

   Resolution: Fixed
Fix Version/s: (was: 1.3)

 Add support for globalization/culture management
 

 Key: SOLR-383
 URL: https://issues.apache.org/jira/browse/SOLR-383
 Project: Solr
  Issue Type: Improvement
  Components: clients - C#
Affects Versions: 1.3
Reporter: Jeff Rodenburg
Assignee: Jeff Rodenburg
Priority: Minor

 SolrSharp should supply configuration and/or programmatic control over 
 windows culture settings.  This is important for working with data being 
 saved to indexes that carry certain formatting expectations for various types 
 of fields, both in SolrSharp as well as the solr field counterparts on the 
 server side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-563) Contrib area for Solr


 [ 
https://issues.apache.org/jira/browse/SOLR-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-563:


Fix Version/s: (was: 1.3)

 Contrib area for Solr
 -

 Key: SOLR-563
 URL: https://issues.apache.org/jira/browse/SOLR-563
 Project: Solr
  Issue Type: Task
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Attachments: SOLR-563.patch


 Add a contrib area for Solr and modify existing build.xml to build, package 
 and distribute contrib projects also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-565) Component to abstract shards from clients


 [ 
https://issues.apache.org/jira/browse/SOLR-565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-565:


Fix Version/s: (was: 1.3)

 Component to abstract shards from clients
 -

 Key: SOLR-565
 URL: https://issues.apache.org/jira/browse/SOLR-565
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: patrick o'leary
Priority: Minor
 Attachments: distributor_component.patch


 A component that will remove the need for calling clients to provide the 
 shards parameter for
 a distributed search. 
 As systems grow, it's better to manage shards with in solr, rather than 
 managing each client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-551) SOlr replication should include the schema also


 [ 
https://issues.apache.org/jira/browse/SOLR-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-551:


Fix Version/s: (was: 1.3)

 SOlr replication should include the schema also
 ---

 Key: SOLR-551
 URL: https://issues.apache.org/jira/browse/SOLR-551
 Project: Solr
  Issue Type: Improvement
  Components: replication
Affects Versions: 1.3
Reporter: Noble Paul

 The current Solr replication just copy the data directory . So if the
 schema changes and I do a re-index it will blissfully copy the index
 and the slaves will fail because of incompatible schema.
 So the steps we follow are
  * Stop rsync on slaves
  * Update the master with new schema
  * re-index data
  * forEach slave
  ** Kill the slave
  ** clean the data directory
  ** install the new schema
  ** restart
  ** do a manual snappull
 The amount of work the admin needs to do is quite significant
 (depending on the no:of slaves). These are manual steps and very error
 prone
 The solution :
 Make the replication mechanism handle the schema replication also. So
 all I need to do is to just change the master and the slaves synch
 automatically
 What is a good way to implement this?
 We have an idea along the following lines
 This should involve changes to the snapshooter and snappuller scripts
 and the snapinstaller components
 Everytime the snapshooter takes a snapshot it must keep the timestamps
 of schema.xml and elevate.xml (all the files which might affect the
 runtime behavior in slaves)
 For subsequent snapshots if the timestamps of any of them is changed
 it must copy the all of them also for replication.
 The snappuller copies the new directory as usual
 The snapinstaller checks if these config files are present ,
 if yes,
  * It can create a temporary core
  * install the changed index and configuration
  * load it completely and swap it out with the original core

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-561) Solr replication by Solr (for windows also)

[
https://issues.apache.org/jira/browse/SOLR-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike Klaas updated SOLR-561:

Fix Version/s: (was: 1.3)

Solr replication by Solr (for windows also)
---

Key: SOLR-561
URL: https://issues.apache.org/jira/browse/SOLR-561
Project: Solr
Issue Type: New Feature
Components: replication
Affects Versions: 1.3
Environment: All
Reporter: Noble Paul

The current replication strategy in solr involves shell scripts . The
following are the drawbacks with the approach
* It does not work with windows
* Replication works as a separate piece not integrated with solr.
* Cannot control replication from solr admin/JMX
* Each operation requires manual telnet to the host
Doing the replication in java has the following advantages
* Platform independence
* Manual steps can be completely eliminated. Everything can be driven from
solrconfig.xml .
** Adding the url of the master in the slaves should be good enough to enable
replication. Other things like frequency of
snapshoot/snappull can also be configured . All other information can be
automatically obtained.
* Start/stop can be triggered from solr/admin or JMX
* Can get the status/progress while replication is going on. It can also
abort an ongoing replication
* No need to have a login into the machine
This issue can track the implementation of solr replication in java

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-506) Enabling HTTP Cache headers should be configurable on a per-handler basis


 [ 
https://issues.apache.org/jira/browse/SOLR-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-506:


Fix Version/s: (was: 1.3)

 Enabling HTTP Cache headers should be configurable on a per-handler basis
 -

 Key: SOLR-506
 URL: https://issues.apache.org/jira/browse/SOLR-506
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar

 HTTP cache headers are needed only for select handler's response and it does 
 not make much sense to enable it globally for all Solr responses.
 Therefore, enabling/disabling cache headers should be configurable on a 
 per-handler basis. It should be enabled by default on the select request 
 handler and disabled by default on all others. It should be possible to 
 override these defaults through configuration as well as through API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-582) Field Aliasing


[ 
https://issues.apache.org/jira/browse/SOLR-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598403#action_12598403
 ] 

Hoss Man commented on SOLR-582:
---

this sounds like a subset of the brainstorming in this wiki page...

http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams

 Field Aliasing
 --

 Key: SOLR-582
 URL: https://issues.apache.org/jira/browse/SOLR-582
 Project: Solr
  Issue Type: New Feature
Reporter: Patrick Debois
Priority: Minor

 XML that are indexed are often using meaningfull, fullblown names for their 
 fields.
 For powersearching shorthand for these terms would come in handy. 
 This would also help for hard to remember values where one could specify 
 multiple names for the same field.
 Also for multi lingual queries this would be interesting.
 Is guess there should be a config file that is read by the queryparser, 
 substibuting terms to their canonical values.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-579) Extend SimplePost with RecurseDirectories, threads, document encoding , number of docs per commit


[ 
https://issues.apache.org/jira/browse/SOLR-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598407#action_12598407
 ] 

Hoss Man commented on SOLR-579:
---

FWIW: SimplePostTool isn't intended to really have ... features. It exists 
purely to provided a cross platform way for people to index the data necessary 
for the tutorial.

i'm -1 on enhancing it in ways that could encourage people to think of it as a 
general purpose reusable tool.

 Extend SimplePost with RecurseDirectories, threads, document encoding , 
 number of docs per commit
 -

 Key: SOLR-579
 URL: https://issues.apache.org/jira/browse/SOLR-579
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.3
 Environment: Applies to all platforms
Reporter: Patrick Debois
Priority: Minor
   Original Estimate: 72h
  Remaining Estimate: 72h

 -When specifying a directory, simplepost should read also the contents of a  
 directory
 New options for the commandline (some only usefull in DATAMODE= files)
 -RECURSEDIRS
 Recursive read of directories as an option, this is usefull for 
 directories with a lot of files where the commandline expansion fails and 
 xargs is too slow
 -DOCENCODING (default = system encoding or UTF-8) 
 For non utf-8 clients , simplepost should include a way to set the 
 encoding of the documents posted
 -THREADSIZE (default =1 ) 
 For large volume posts, a threading pool makes sense , using JDK 1.5 
 Threadpool model
 -DOCSPERCOMMIT (default = 1)
 Number of documents after which a commit is done, instead of only at 
 the end
 Note: not to break the existing behaviour of the existing SimplePost tool 
 (post.sh) might be used in scripts 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-580) Filte Query: Retrieve all docs with facets missing


[ 
https://issues.apache.org/jira/browse/SOLR-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598411#action_12598411
 ] 

Hoss Man commented on SOLR-580:
---

i'm confused ... assuming these facet counts are for field facetField then 
can't all the docs counted by facet.missing be retrieved using: 
{{fq=-facetField:[* TO *]}} ?

 Filte Query: Retrieve  all docs with facets missing
 ---

 Key: SOLR-580
 URL: https://issues.apache.org/jira/browse/SOLR-580
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Patrick Debois
Priority: Minor

 Consider this list
 facetA - 10
 facetB - 20
 facets missing  - 30
 For facetA and facetB it is easy to select the correct fq=FACET:value . But 
 to be able to see the document that have missing facets one needs to specifiy 
 a NOT fq= for every value in the facet.
 Therefore a kind of short hand would be usefull to select all documents that 
 have a facet missing. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Release of SOLR 1.3

2008-05-20 Thread Mike Klaas



On 20-May-08, at 1:53 AM, Andrew Savory wrote:

2008/5/19 Chris Hostetter [EMAIL PROTECTED]:

If people are particularly eager to see a 1.3 release, the best  
thing to
do is subscribe to solr-dev and start a dialog there about what  
issues
people thing are show stopers for 1.3 and what assistance the  
various

people working on those issues can use.


So, what are the show stoppers, how can we help, what can we reassign
to a future release?


I've gone and reassigned a bunch of issues that were labeled 1.3 by  
the original submitter, if the submitter is not a committer (perhaps  
this field shouldn't be editable by everyone).  That still leaves many  
issues, several of which I don't think are critical for 1.3.


I propose that we follow an ownership process for getting this  
release out the door: we give committers a week to fill in the  
assigned to field in JIRA for the 1.3 issues.  Any issue that isn't  
assigned after one week gets moved to a future release.  Then we can  
each evaluate the issues we are responsible for.


Any non-1.3-marked issues should be added at this time too.


Taking a look through the list there's quite a few issues with patches
attached that aren't applied yet. Clearing these out would cut the
open bug count by almost half:


But then we'd have to open bug reports for each one that says make  
sure this actually works and that it is the correct direction for  
Solr :)



It's a little weird to see patch 'development' going on in JIRA
(sometimes for over a year), rather than getting the patches into svn
and then working there... I'd worry that some valuable code history is
getting lost along the way? Yes, it's a tough call between adding
'bad' code and waiting for the perfect patch, but bad code creates
healthy communities and is better than no code :-)


Committing the code to trunk creates a path dependence and  
responsibility for maintaining the code.  There would also be a high  
probability of trunk never being in a releasable state, given the  
chance of there being a half-baked idea in trunk that we don't want to  
be bound to for the rest of Solr's lifetime.


(incidentally, this is the same philosophy we apply at my company,  
except that development is usually done in branches rather than  
patches.)


-Mike

Re: Release of SOLR 1.3

2008-05-20 Thread Shalin Shekhar Mangar

+1 for your suggestions Mike.

I'd like to see a few of the smaller issues get committed in 1.3 such as
SOLR-256 (JMX), SOLR-536 (binding for SolrJ), SOLR-430 (SpellChecker support
in SolrJ) etc. Also, SOLR-561 (replication by Solr) would be really cool to
have in the next release. Noble and I are working on it and plan to give a
patch soon.

Mike -- you removed SOLR-563 (Contrib area for Solr) from 1.3 but it is a
dependency for SOLR-469 (DataImportHandler) as it was decided to have
DataImportHandler as a contrib project. It would also be good to have a
rough release roadmaps to work against. Can fixed release cycle (say every 6
months) work for Solr?

On Wed, May 21, 2008 at 12:45 AM, Mike Klaas [EMAIL PROTECTED] wrote:


 On 20-May-08, at 1:53 AM, Andrew Savory wrote:

 2008/5/19 Chris Hostetter [EMAIL PROTECTED]:

  If people are particularly eager to see a 1.3 release, the best thing to
 do is subscribe to solr-dev and start a dialog there about what issues
 people thing are show stopers for 1.3 and what assistance the various
 people working on those issues can use.


 So, what are the show stoppers, how can we help, what can we reassign
 to a future release?


 I've gone and reassigned a bunch of issues that were labeled 1.3 by the
 original submitter, if the submitter is not a committer (perhaps this field
 shouldn't be editable by everyone).  That still leaves many issues, several
 of which I don't think are critical for 1.3.

 I propose that we follow an ownership process for getting this release
 out the door: we give committers a week to fill in the assigned to field
 in JIRA for the 1.3 issues.  Any issue that isn't assigned after one week
 gets moved to a future release.  Then we can each evaluate the issues we are
 responsible for.

 Any non-1.3-marked issues should be added at this time too.

  Taking a look through the list there's quite a few issues with patches
 attached that aren't applied yet. Clearing these out would cut the
 open bug count by almost half:


 But then we'd have to open bug reports for each one that says make sure
 this actually works and that it is the correct direction for Solr :)

  It's a little weird to see patch 'development' going on in JIRA
 (sometimes for over a year), rather than getting the patches into svn
 and then working there... I'd worry that some valuable code history is
 getting lost along the way? Yes, it's a tough call between adding
 'bad' code and waiting for the perfect patch, but bad code creates
 healthy communities and is better than no code :-)


 Committing the code to trunk creates a path dependence and responsibility
 for maintaining the code.  There would also be a high probability of trunk
 never being in a releasable state, given the chance of there being a
 half-baked idea in trunk that we don't want to be bound to for the rest of
 Solr's lifetime.

 (incidentally, this is the same philosophy we apply at my company, except
 that development is usually done in branches rather than patches.)

 -Mike




-- 
Regards,
Shalin Shekhar Mangar.

[jira] Commented: (SOLR-565) Component to abstract shards from clients

2008-05-20 Thread Jayson Minard (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598473#action_12598473
 ] 

Jayson Minard commented on SOLR-565:


Another item to consider:

Some times you want to control which shards participate in any given query.  
This is an important optimization for large scale deployments that need to 
quickly subset what is queried so that they do not waste CPU of irrelevant 
shards.  

 Component to abstract shards from clients
 -

 Key: SOLR-565
 URL: https://issues.apache.org/jira/browse/SOLR-565
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: patrick o'leary
Priority: Minor
 Attachments: distributor_component.patch


 A component that will remove the need for calling clients to provide the 
 shards parameter for
 a distributed search. 
 As systems grow, it's better to manage shards with in solr, rather than 
 managing each client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-551) SOlr replication should include the schema also

2008-05-20 Thread Jayson Minard (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598476#action_12598476
 ] 

Jayson Minard commented on SOLR-551:


Why is the schema stored outside of the index?  Is another possible option to 
store it in a magic record within the index?  That allows anyone to retrieve it 
that wants to see the schema, for example the UI might want to know the static 
fields quickly and can use the schema to determine that information.  

Basically, can some meta-data about the index be stored in the index which 
solves the replication problem, and makes it more easily accessible to the 
outside world?

 SOlr replication should include the schema also
 ---

 Key: SOLR-551
 URL: https://issues.apache.org/jira/browse/SOLR-551
 Project: Solr
  Issue Type: Improvement
  Components: replication
Affects Versions: 1.3
Reporter: Noble Paul

 The current Solr replication just copy the data directory . So if the
 schema changes and I do a re-index it will blissfully copy the index
 and the slaves will fail because of incompatible schema.
 So the steps we follow are
  * Stop rsync on slaves
  * Update the master with new schema
  * re-index data
  * forEach slave
  ** Kill the slave
  ** clean the data directory
  ** install the new schema
  ** restart
  ** do a manual snappull
 The amount of work the admin needs to do is quite significant
 (depending on the no:of slaves). These are manual steps and very error
 prone
 The solution :
 Make the replication mechanism handle the schema replication also. So
 all I need to do is to just change the master and the slaves synch
 automatically
 What is a good way to implement this?
 We have an idea along the following lines
 This should involve changes to the snapshooter and snappuller scripts
 and the snapinstaller components
 Everytime the snapshooter takes a snapshot it must keep the timestamps
 of schema.xml and elevate.xml (all the files which might affect the
 runtime behavior in slaves)
 For subsequent snapshots if the timestamps of any of them is changed
 it must copy the all of them also for replication.
 The snappuller copies the new directory as usual
 The snapinstaller checks if these config files are present ,
 if yes,
  * It can create a temporary core
  * install the changed index and configuration
  * load it completely and swap it out with the original core

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-579) Extend SimplePost with RecurseDirectories, threads, document encoding , number of docs per commit

2008-05-20 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598493#action_12598493
 ] 

Otis Gospodnetic commented on SOLR-579:
---

Same here, -1.  That would create the same situation that we sometimes see over 
in Lucene land where people use the Lucene demo and think *that* is Lucene, or 
they take the demo and want it to run as an out-of-the-box application for them.


 Extend SimplePost with RecurseDirectories, threads, document encoding , 
 number of docs per commit
 -

 Key: SOLR-579
 URL: https://issues.apache.org/jira/browse/SOLR-579
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.3
 Environment: Applies to all platforms
Reporter: Patrick Debois
Priority: Minor
   Original Estimate: 72h
  Remaining Estimate: 72h

 -When specifying a directory, simplepost should read also the contents of a  
 directory
 New options for the commandline (some only usefull in DATAMODE= files)
 -RECURSEDIRS
 Recursive read of directories as an option, this is usefull for 
 directories with a lot of files where the commandline expansion fails and 
 xargs is too slow
 -DOCENCODING (default = system encoding or UTF-8) 
 For non utf-8 clients , simplepost should include a way to set the 
 encoding of the documents posted
 -THREADSIZE (default =1 ) 
 For large volume posts, a threading pool makes sense , using JDK 1.5 
 Threadpool model
 -DOCSPERCOMMIT (default = 1)
 Number of documents after which a commit is done, instead of only at 
 the end
 Note: not to break the existing behaviour of the existing SimplePost tool 
 (post.sh) might be used in scripts 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-565) Component to abstract shards from clients

2008-05-20 Thread patrick o'leary (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598495#action_12598495
 ] 

patrick o'leary commented on SOLR-565:
--

That's a different aspect, where you either have a map reduce / ontology / hash 
based system to focus you queries to certain farms
of servers. 

This component could act as an example of how to accomplish that, but there are 
so many possible implementations that it's not
probable to provide a scope for it.


 Component to abstract shards from clients
 -

 Key: SOLR-565
 URL: https://issues.apache.org/jira/browse/SOLR-565
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: patrick o'leary
Priority: Minor
 Attachments: distributor_component.patch


 A component that will remove the need for calling clients to provide the 
 shards parameter for
 a distributed search. 
 As systems grow, it's better to manage shards with in solr, rather than 
 managing each client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

2008-05-20 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598494#action_12598494
 ] 

Mark Miller commented on SOLR-553:
--

Probably best to create a new ticket (if necessary) about the spanax/span 
spanbx/span instead of spanax bx/span problem. That highlights have 
incorrect matches is far worse. I'll adjust the problem description.

If I remember correctly, this was an ease of implementation issue. Part of it 
was fitting into the current Highlighter framework (individual tokens are 
scored and highlighted) and part of it was ease in general I think. I am not 
sure that it would be too easy to alter.

It's very easy to do with the new Highlighter I have been working on, the 
LargeDocHighlighter. It breaks from the current API, and makes this type of 
highlight markup quite easy. It may never see the light of day though...to do 
what I want, all parts of the query need to be located with the MemoryIndex, 
and the time this takes on non position sensitive queries clauses is almost 
equal to the savings I get from not iterating through and scoring each token in 
a TokenStream. I do still have hopes I can pull something off though, and it 
may end up being useful for something else.

For now though, Highlighting each each token seems a small inconvenience to 
retain all the old Highlighters tests, corner cases, and speed in non position 
sensitive scoring. Thats not to say there will not be a way if you take a look 
at the code though.

 Highlighter does not match phrase queries correctly
 ---

 Key: SOLR-553
 URL: https://issues.apache.org/jira/browse/SOLR-553
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 1.2
 Environment: all
Reporter: Brian Whitman
Assignee: Otis Gospodnetic
 Attachments: highlighttest.xml, Solr-553.patch, Solr-553.patch, 
 Solr-553.patch


 http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
 Say we search for the band I Love You But I've Chosen Darkness
 .../selectrows=100q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22fq=type:htmlhl=truehl.fl=contenthl.fragsize=500hl.snippets=5hl.simple.pre=%3Cspan%3Ehl.simple.post=%3C/span%3E
 The highlight returns a snippet that does have the name altogether:
 Lights (Live) : spanI/span spanLove/span spanYou/span But 
 spanI've/span spanChosen/span spanDarkness/span :
 But also returns unrelated snips from the same page:
 Black Francis Shop spanI/span Think spanI/span spanLove/span 
 spanYou/span
 A correct highlighter should not return snippets that do not match the phrase 
 exactly.
 LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem 
 from the Lucene end. Solr should get it too.
 Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Release of SOLR 1.3

2008-05-20 Thread Otis Gospodnetic

Hi,

Half-baked things getting into trunk probably won't happen.  Lots of people use 
Solr nightlies (cause they are often stable enough).  If we were a bunch paid 
to work on Solr, then we'd be more organized/structured and have more regular 
release cycles.  Solr is also not likely to have a very short lifetime -- too 
many people use it, develop for it, and depend on it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Andrew Savory [EMAIL PROTECTED]
 To: solr-dev@lucene.apache.org
 Sent: Tuesday, May 20, 2008 3:51:50 PM
 Subject: Re: Release of SOLR 1.3
 
 Hi Mike,
 
 On 20/05/2008, Mike Klaas wrote:
 
   I've gone and reassigned a bunch of issues that were labeled 1.3 by the
  original submitter, if the submitter is not a committer (perhaps this field
  shouldn't be editable by everyone).  That still leaves many issues, several
  of which I don't think are critical for 1.3.
 
 Cool, thanks for that. Indeed, assigning issues to releases should
 only be possible by committers.
 
   Taking a look through the list there's quite a few issues with patches
   attached that aren't applied yet. Clearing these out would cut the
   open bug count by almost half:
  
 
   But then we'd have to open bug reports for each one that says make sure
  this actually works and that it is the correct direction for Solr :)
 
 Heh. Thankfully many of the patches look well-tested and extremely
 well discussed already, so I'd hope they wouldn't require too many
 followup issues!
 
   It's a little weird to see patch 'development' going on in JIRA
   (sometimes for over a year), rather than getting the patches into svn
   and then working there... I'd worry that some valuable code history is
   getting lost along the way? Yes, it's a tough call between adding
   'bad' code and waiting for the perfect patch, but bad code creates
   healthy communities and is better than no code :-)
 
   Committing the code to trunk creates a path dependence and responsibility
  for maintaining the code.  There would also be a high probability of trunk
  never being in a releasable state, given the chance of there being a
  half-baked idea in trunk that we don't want to be bound to for the rest of
  Solr's lifetime.
 
 I'd tend to disagree: committing the patches to trunk allows
 widespread testing and the chance for wider review of the code to see
 if it does what it should. Only when the code is part of a release is
 there any obligation to a proper lifecycle (ongoing support,
 deprecation, then finally removal).
 
 Of course, being concerned for the state of trunk is a good thing
 overall, but it seems from my casual observation that some
 contributions that are far from half-baked are not making it into
 trunk: this is even worse as it might lead to an unnaturally short
 lifetime for Solr.
 
   (incidentally, this is the same philosophy we apply at my company, except
  that development is usually done in branches rather than patches.)
 
 Sure, I'm currently working in a branch-per-feature environment, and
 it has some advantages for a corporate environment with no community
 concerns. But here we're talking about consensus-driven open
 development, for which a more open approach may be appropriate. True,
 it may seem chaotic and perhaps a bit risky - but with enough eyes on
 the code we can mitigate that risk.
 
 And hey, if some contributions are really controversial, there's
 always the option to do more branches (or even set up a scratchpad).
 
 Just my €0.02!
 
 
 Andrew.
 --
 [EMAIL PROTECTED] / [EMAIL PROTECTED]
 http://www.andrewsavory.com/

Re: Release of SOLR 1.3

2008-05-20 Thread Otis Gospodnetic

I'll take the contrib/ issue if nobody else does.  I would want to see that one 
in 1.3, so we can get DataImportHandler in.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Shalin Shekhar Mangar [EMAIL PROTECTED]
 To: solr-dev@lucene.apache.org
 Sent: Tuesday, May 20, 2008 3:32:21 PM
 Subject: Re: Release of SOLR 1.3
 
 +1 for your suggestions Mike.
 
 I'd like to see a few of the smaller issues get committed in 1.3 such as
 SOLR-256 (JMX), SOLR-536 (binding for SolrJ), SOLR-430 (SpellChecker support
 in SolrJ) etc. Also, SOLR-561 (replication by Solr) would be really cool to
 have in the next release. Noble and I are working on it and plan to give a
 patch soon.
 
 Mike -- you removed SOLR-563 (Contrib area for Solr) from 1.3 but it is a
 dependency for SOLR-469 (DataImportHandler) as it was decided to have
 DataImportHandler as a contrib project. It would also be good to have a
 rough release roadmaps to work against. Can fixed release cycle (say every 6
 months) work for Solr?
 
 On Wed, May 21, 2008 at 12:45 AM, Mike Klaas wrote:
 
 
  On 20-May-08, at 1:53 AM, Andrew Savory wrote:
 
  2008/5/19 Chris Hostetter :
 
   If people are particularly eager to see a 1.3 release, the best thing to
  do is subscribe to solr-dev and start a dialog there about what issues
  people thing are show stopers for 1.3 and what assistance the various
  people working on those issues can use.
 
 
  So, what are the show stoppers, how can we help, what can we reassign
  to a future release?
 
 
  I've gone and reassigned a bunch of issues that were labeled 1.3 by the
  original submitter, if the submitter is not a committer (perhaps this field
  shouldn't be editable by everyone).  That still leaves many issues, several
  of which I don't think are critical for 1.3.
 
  I propose that we follow an ownership process for getting this release
  out the door: we give committers a week to fill in the assigned to field
  in JIRA for the 1.3 issues.  Any issue that isn't assigned after one week
  gets moved to a future release.  Then we can each evaluate the issues we are
  responsible for.
 
  Any non-1.3-marked issues should be added at this time too.
 
   Taking a look through the list there's quite a few issues with patches
  attached that aren't applied yet. Clearing these out would cut the
  open bug count by almost half:
 
 
  But then we'd have to open bug reports for each one that says make sure
  this actually works and that it is the correct direction for Solr :)
 
   It's a little weird to see patch 'development' going on in JIRA
  (sometimes for over a year), rather than getting the patches into svn
  and then working there... I'd worry that some valuable code history is
  getting lost along the way? Yes, it's a tough call between adding
  'bad' code and waiting for the perfect patch, but bad code creates
  healthy communities and is better than no code :-)
 
 
  Committing the code to trunk creates a path dependence and responsibility
  for maintaining the code.  There would also be a high probability of trunk
  never being in a releasable state, given the chance of there being a
  half-baked idea in trunk that we don't want to be bound to for the rest of
  Solr's lifetime.
 
  (incidentally, this is the same philosophy we apply at my company, except
  that development is usually done in branches rather than patches.)
 
  -Mike
 
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.

[jira] Commented: (SOLR-551) SOlr replication should include the schema also

2008-05-20 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598498#action_12598498
 ] 

Otis Gospodnetic commented on SOLR-551:
---

The Hadoop -user group has a recent thread about synchronizing config 
distribution and it looks like people really like the idea of retrieving the 
configs from a well known URL.  Perhaps that's the thing to do here, too (a la 
admin pages).


 SOlr replication should include the schema also
 ---

 Key: SOLR-551
 URL: https://issues.apache.org/jira/browse/SOLR-551
 Project: Solr
  Issue Type: Improvement
  Components: replication
Affects Versions: 1.3
Reporter: Noble Paul

 The current Solr replication just copy the data directory . So if the
 schema changes and I do a re-index it will blissfully copy the index
 and the slaves will fail because of incompatible schema.
 So the steps we follow are
  * Stop rsync on slaves
  * Update the master with new schema
  * re-index data
  * forEach slave
  ** Kill the slave
  ** clean the data directory
  ** install the new schema
  ** restart
  ** do a manual snappull
 The amount of work the admin needs to do is quite significant
 (depending on the no:of slaves). These are manual steps and very error
 prone
 The solution :
 Make the replication mechanism handle the schema replication also. So
 all I need to do is to just change the master and the slaves synch
 automatically
 What is a good way to implement this?
 We have an idea along the following lines
 This should involve changes to the snapshooter and snappuller scripts
 and the snapinstaller components
 Everytime the snapshooter takes a snapshot it must keep the timestamps
 of schema.xml and elevate.xml (all the files which might affect the
 runtime behavior in slaves)
 For subsequent snapshots if the timestamps of any of them is changed
 it must copy the all of them also for replication.
 The snappuller copies the new directory as usual
 The snapinstaller checks if these config files are present ,
 if yes,
  * It can create a temporary core
  * install the changed index and configuration
  * load it completely and swap it out with the original core

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-565) Component to abstract shards from clients

2008-05-20 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598499#action_12598499
 ] 

Otis Gospodnetic commented on SOLR-565:
---

I agree.  Let's get this in and then worry about getting fancy.  This should go 
in 1.3 and I'll take it if nobody else does.

 Component to abstract shards from clients
 -

 Key: SOLR-565
 URL: https://issues.apache.org/jira/browse/SOLR-565
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: patrick o'leary
Priority: Minor
 Attachments: distributor_component.patch


 A component that will remove the need for calling clients to provide the 
 shards parameter for
 a distributed search. 
 As systems grow, it's better to manage shards with in solr, rather than 
 managing each client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-565) Component to abstract shards from clients

2008-05-20 Thread Jayson Minard (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598506#action_12598506
 ] 

Jayson Minard commented on SOLR-565:


Selecting shards by sets is not overly fancy.  You basically allow shards to be 
specified by location, then you allow shard sets to be specified including 
those sets.  You reference the set (by default there is an All set) during 
the query and you are off to the races.

Shard selection by sets covers a lot of ground in terms of bringing in more use 
cases without adding that much more complexity.  Really, not much complexity, 
just a bit more code.



 Component to abstract shards from clients
 -

 Key: SOLR-565
 URL: https://issues.apache.org/jira/browse/SOLR-565
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: patrick o'leary
Priority: Minor
 Attachments: distributor_component.patch


 A component that will remove the need for calling clients to provide the 
 shards parameter for
 a distributed search. 
 As systems grow, it's better to manage shards with in solr, rather than 
 managing each client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598512#action_12598512
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Hey guys I created a dictionary index from the following XML file:
add
  doc
field name=id10/field
field name=wordpizza/field
  /doc
  doc
field name=id11/field
field name=wordclub/field
  /doc
  doc
field name=id12/field
field name=wordbar/field
  /doc
/add
My config is the following:
searchComponent name=spellcheck 
class=org.apache.solr.handler.component.SpellCheckComponent
lst name=dictionary
str name=namedefault/str
str name=typeindex/str
str name=fieldword/str
!--str name=indexDirc:/temp/spellindex/str--
/lst
 /searchComponent
and word is defined in schema.xml as:
field name=word   type=stringindex=true  stored=true 
required=false/

When I run a query with the following URL:
http://localhost:8983/solr/select/?q=barrspellcheck=truespellcheck.dictionary=defaultspellcheck.count=10
I get the following response:
lst name=spellcheck
lst name=suggestions
int name=numFound1/int
arr name=barr
strbar/str
/arr
/lst
/lst
which is what I expect.
However with this URL:
http://wil1devsch1.cs.tmcs:8983/solr/select/?q=barspellcheck=truespellcheck.dictionary=defaultspellcheck.count=10
 where bar is correctly spelled, I get the following:
lst name=spellcheck
lst name=suggestions
int name=numFound1/int
arr name=bar
strbarr/str
/arr
/lst
/lst
Could you please tell me where the word barr is coming from, and why it is 
being suggested? 

Thanks!

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Accessing IndexReader during core initialization hangs init

2008-05-20 Thread Chris Hostetter


: While working on SOLR-572, I found that if I try to access the
: IndexReader using SolrCore.getSearcher().get().getReader() within the
: SolrCoreAware.inform method, the initialization process hangs.

I haven't really thought about it before, but it seems logical that 
SolrCore.getSearcher() should work during the inform stage .. there may 
be some chicken and egg problems there though -- I'm guessing if it 
doesn't work right now it might be related to the issues with needing to 
inform all plugins before triggering the firstSearcher events (since 
handlers are likely used by those events) -- but it seems like the search 
could be created, then inform the plugins, then trigger the 
firstSearcher events.

: IndexReader in this way? I needed access to the IndexReader so that I
: can create the spell check index during core initialization. For now,
: I've moved the index creation to the first query coming into
: SpellCheckComponent (note to myself: review thread-safety in the init
: code).

As I mentioned in some spelling related issue recently (although aparently 
not SOLR-572) the straightforward way to do this is to initialize things 
like this when requests with a very specific initialization params occur, 
and then document that the recommended way to use your handler is to 
configure a request with those params as part of the firstSearcher event.

(having initilization work done like this is also neccessary to 
allow rebuilding a spelling index after the dictionary has changed)


-Hoss

[jira] Issue Comment Edited: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598512#action_12598512
 ] 

oleg_gnatovskiy edited comment on SOLR-572 at 5/20/08 3:39 PM:
---

Hey guys I created a dictionary index from the following XML file:
add
  doc
field name=id10/field
field name=wordpizza/field
  /doc
  doc
field name=id11/field
field name=wordclub/field
  /doc
  doc
field name=id12/field
field name=wordbar/field
  /doc
/add
My config is the following:
searchComponent name=spellcheck 
class=org.apache.solr.handler.component.SpellCheckComponent
lst name=dictionary
str name=namedefault/str
str name=typeindex/str
str name=fieldword/str
!--str name=indexDirc:/temp/spellindex/str--
/lst
 /searchComponent
and word is defined in schema.xml as:
field name=word   type=stringindex=true  stored=true 
required=false/

When I run a query with the following URL:
http://localhost:8983/solr/select/?q=barrspellcheck=truespellcheck.dictionary=defaultspellcheck.count=10
I get the following response:
lst name=spellcheck
lst name=suggestions
int name=numFound1/int
arr name=barr
strbar/str
/arr
/lst
/lst
which is what I expect.
However with this URL:
http://localhost:8983/solr/select/?q=barspellcheck=truespellcheck.dictionary=defaultspellcheck.count=10
 where bar is correctly spelled, I get the following:
lst name=spellcheck
lst name=suggestions
int name=numFound1/int
arr name=bar
strbarr/str
/arr
/lst
/lst
Could you please tell me where the word barr is coming from, and why it is 
being suggested? 

Thanks!

  was (Author: oleg_gnatovskiy):
Hey guys I created a dictionary index from the following XML file:
add
  doc
field name=id10/field
field name=wordpizza/field
  /doc
  doc
field name=id11/field
field name=wordclub/field
  /doc
  doc
field name=id12/field
field name=wordbar/field
  /doc
/add
My config is the following:
searchComponent name=spellcheck 
class=org.apache.solr.handler.component.SpellCheckComponent
lst name=dictionary
str name=namedefault/str
str name=typeindex/str
str name=fieldword/str
!--str name=indexDirc:/temp/spellindex/str--
/lst
 /searchComponent
and word is defined in schema.xml as:
field name=word   type=stringindex=true  stored=true 
required=false/

When I run a query with the following URL:
http://localhost:8983/solr/select/?q=barrspellcheck=truespellcheck.dictionary=defaultspellcheck.count=10
I get the following response:
lst name=spellcheck
lst name=suggestions
int name=numFound1/int
arr name=barr
strbar/str
/arr
/lst
/lst
which is what I expect.
However with this URL:
http://wil1devsch1.cs.tmcs:8983/solr/select/?q=barspellcheck=truespellcheck.dictionary=defaultspellcheck.count=10
 where bar is correctly spelled, I get the following:
lst name=spellcheck
lst name=suggestions
int name=numFound1/int
arr name=bar
strbarr/str
/arr
/lst
/lst
Could you please tell me where the word barr is coming from, and why it is 
being suggested? 

Thanks!
  
 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-578) Binary stream response for request

[
https://issues.apache.org/jira/browse/SOLR-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598532#action_12598532
]

Hoss Man commented on SOLR-578:
---

bq. If your Handler can write out data only using a specific writer, you have
the flexibility of overriding the 'wt' in the handler. Register your own writer
in solrconfig.xml.

Correct.

(a handler can even go so far as to fail in the inform(SolrCore) method if
the writer it expects is not present)

The ShowFileRequestHandler and RawResponseWriter are good examples of this
model (although it would probably make sense to change RawResponseWriter to
implement BinaryQueryResponseWriter at some point)

bq. It is incongruous to have SolrQueryRequest.getContentStreams() but nothing
similar for SolrQueryResponse.

Only if you are use to thinking of things in terms of the servlet API : )

generally speaking, the majority of Request Handlers shouldn't be dealing with
raw character or binary streams ... they should be dealing with simple objects
and deferring rendering of those objects to the QueryResponseWriter to decide
how to render them based on the wishes of the client ... there are exceptions
to every rule however, hence the approach described here where the Handler
forces a particular response writer.

Binary stream response for request
--

Key: SOLR-578
URL: https://issues.apache.org/jira/browse/SOLR-578
Project: Solr
Issue Type: Improvement
Affects Versions: 1.3
Reporter: Jason Rutherglen

Allow sending binary response back from request. This is not the same as
encoding in binary such as BinaryQueryResponseWriter. Simply need access to
servlet response stream for sending something like a Lucene segment.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-572) Spell Checker as a Search Component


 [ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Gnatovskiy updated SOLR-572:
-

Comment: was deleted

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598533#action_12598533
 ] 

Oleg Gnatovskiy commented on SOLR-572:
--

Hey guys please disregard my last comment, I had a configuration issue that 
caused the problem. I was just wondering if there is a way to get the 
suggestions not to echo the query if there are no suggestions. For example a 
query where q=food probably should return a suggestion of food.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (SOLR-578) Binary stream response for request

2008-05-20 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen closed SOLR-578.
-

   Resolution: Won't Fix
Fix Version/s: 1.3

Ok

 Binary stream response for request
 --

 Key: SOLR-578
 URL: https://issues.apache.org/jira/browse/SOLR-578
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: Jason Rutherglen
 Fix For: 1.3


 Allow sending binary response back from request.  This is not the same as 
 encoding in binary such as BinaryQueryResponseWriter.  Simply need access to 
 servlet response stream for sending something like a Lucene segment.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-303) Distributed Search over HTTP

2008-05-20 Thread Lars Kotthoff (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598548#action_12598548
 ] 

Lars Kotthoff commented on SOLR-303:


On closer inspection of the code, are the fields sort and prefix of 
FieldFacet used anywhere at all? They don't seem to be referenced anywhere in 
the code and just removing them doesn't seem to have any obvious effect.

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed_add_tests_for_intended_behavior.patch, 
 distributed_facet_count_bugfix.patch, distributed_pjaol.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
 fedsearch.stu.patch, shards_qt.patch, solr-dist-faceting-non-ascii-all.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-05-20 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598549#action_12598549
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


Oleg -- Thanks for trying out the patch. No, currently it does not signal if 
suggestions are not found, it just returns the query terms themselves. I'll add 
that feature.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-303) Distributed Search over HTTP

2008-05-20 Thread Gunnar Wagenknecht (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12598551#action_12598551
 ] 

Gunnar Wagenknecht commented on SOLR-303:
-

Hi / Hallo,

Thanks for your mail. Unfortunately, I won't be able to answer it
soon. I'm on vacation till June 2nd without access to my mails.



Vielen Dank für die Email. Leider werde ich nicht sofort antworten.
Ich bin bis 2. Juni im Urlaub ohne Zugriff auf mein Postfach.

-Gunnar

-- 
Gunnar Wagenknecht
[EMAIL PROTECTED]
http://wagenknecht.org/


 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed_add_tests_for_intended_behavior.patch, 
 distributed_facet_count_bugfix.patch, distributed_pjaol.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
 fedsearch.stu.patch, shards_qt.patch, solr-dist-faceting-non-ascii-all.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-572) Spell Checker as a Search Component