[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

2007-07-30 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516307
 ] 

Hoss Man commented on SOLR-308:
---

a few misc comments...

1)  ...val.startsWith(NEW)... seems like a bad idea, why not just 
val.equals(NEW) ?

2) classes like IntField and DateField don't currently do strong parsing 
validation in the toInternal method, but this UUIDFIeld class does ... should 
it?

3) should toObject be strongly typed to return UUID ?

4) there shouldn't be new methods in the output writers for this field type ... 
output writers should only need to know about the most primitive types of data 
that should be viable regardless of the client language (ie: string, int, 
float, date, list, etc...)  the UUIDField should just write itself out as a 
string  (using str in the xml response writer)

 Add a field that generates an unique id when you have none in your data to 
 index
 

 Key: SOLR-308
 URL: https://issues.apache.org/jira/browse/SOLR-308
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Thomas Peuss
Priority: Minor
 Attachments: UUIDField.patch, UUIDField.patch


 This patch adds a field that generates an unique id when you have no unique 
 id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Master/Slave and Primary/Secondary

2007-07-30 Thread Chris Hostetter

: As it pertains to Solr, I've often used Master and Searcher.
: Probably even more correct would be Indexer and Searcher.
: Primary and Secondary don't quite sound right for the Solr
: situation... (but Master and Slave doesn't capture it any better
: either).

primary/secondary doesn't relaly apply because the labels are too vague
... primary for what?  primary search box? primary indexing box?

typically the terms primary/secondary relate to failover modes, you have a
primary woozle that does everything a woozle is suppose to do, but if htat
woozle stops working the secondary woozle steps in and acts very woozly.

This is not an inherient concept in Solr.

the (computer) master/slave concepts are not inherient in Solr either --
but they are a part of the *distribution* of indexes that Solr has hooks
for.  one Solr instance can be declared the master of the index and
given all the updates to process while other Solr instances can choose to
slave off of a master of their choice and take indexes as is from that
master -- but even then the slaves may themselves be masters as far as
other solr indexes further down a distribution chain are concerned -- so
even using hte terminology master/searcher doesn't really apply, since
the slaves may not actaully be used for seraching, but only as
way-points.

In the end, it's all fairly irrelevent.

While the term master is used quite a bit in the distribution scripts
(to indicate where to pull an index from) the term slave is the only
contentious term in the pair and it's use is confined to in
distribution.jsp and a few comments in the scripts about
snappuller.status.  if anyone wants to submit a patch that changes the
comments/variables names, feel free -- just as long as it's something
descriptive regarding the nature of the data exchange relationship.



-Hoss



[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

2007-07-30 Thread Thomas Peuss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516368
 ] 

Thomas Peuss commented on SOLR-308:
---

1.) I change it.
2.) I remove the check. I understand that this has a performance impact.
3.) I changed it to what DateField and IntField do.
4.) I remove that as well.

If we don't do strong parsing we should call this IDField instead of UUIDField. 
 If we don't enforce that this is an UUID we shouldn't name it like that. What 
do you think?

 Add a field that generates an unique id when you have none in your data to 
 index
 

 Key: SOLR-308
 URL: https://issues.apache.org/jira/browse/SOLR-308
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Thomas Peuss
Priority: Minor
 Attachments: UUIDField.patch, UUIDField.patch


 This patch adds a field that generates an unique id when you have no unique 
 id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

2007-07-30 Thread Thomas Peuss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Peuss updated SOLR-308:
--

Attachment: UUIDField.patch

Changes based on comments...

 Add a field that generates an unique id when you have none in your data to 
 index
 

 Key: SOLR-308
 URL: https://issues.apache.org/jira/browse/SOLR-308
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Thomas Peuss
Priority: Minor
 Attachments: UUIDField.patch, UUIDField.patch, UUIDField.patch


 This patch adds a field that generates an unique id when you have no unique 
 id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-215) Multiple Solr Cores

2007-07-30 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516415
 ] 

Ryan McKinley commented on SOLR-215:


 we can easily 'reinstate' SolrConfig.config by assigning it the 'null' core 
 config as a compatibility (deprecated?) 

yes.  that is good.  

 
  should I create/upload a new version of the patch with the described 
 modifications or is this taken care of by the committer? (this sounds like a 
 stupid question, my apologies if it is; just let me know).
 

whatever happens first ;)

If you have time, can you make the modifications.  That will make it easier.

 Multiple Solr Cores
 ---

 Key: SOLR-215
 URL: https://issues.apache.org/jira/browse/SOLR-215
 Project: Solr
  Issue Type: Improvement
Reporter: Henri Biestro
Priority: Minor
 Attachments: solr-215.patch, solr-215.patch, solr-215.patch, 
 solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
 solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
 solr-trunk-533775.patch, solr-trunk-538091.patch, solr-trunk-542847-1.patch, 
 solr-trunk-542847.patch, solr-trunk-src.patch


 WHAT:
 As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index.
 This patch is intended to allow multiple cores in Solr which also brings 
 multiple indexes capability.
 The patch file to grab is solr-215.patch.zip (see MISC session below).
 WHY:
 The current Solr practical wisdom is that one schema - thus one index - is 
 most likely to accomodate your indexing needs, using a filter to segregate 
 documents if needed. If you really need multiple indexes, deploy multiple web 
 applications.
 There are a some use cases however where having multiple indexes or multiple 
 cores through Solr itself may make sense.
 Multiple cores:
 Deployment issues within some organizations where IT will resist deploying 
 multiple web applications.
 Seamless schema update where you can create a new core and switch to it 
 without starting/stopping servers.
 Embedding Solr in your own application (instead of 'raw' Lucene) and 
 functionally need to segregate schemas  collections.
 Multiple indexes:
 Multiple language collections where each document exists in different 
 languages, analysis being language dependant.
 Having document types that have nothing (or very little) in common with 
 respect to their schema, their lifetime/update frequencies or even collection 
 sizes.
 HOW:
 The best analogy is to consider that instead of deploying multiple 
 web-application, you can have one web-application that hosts more than one 
 Solr core. The patch does not change any of the core logic (nor the core 
 code); each core is configured  behaves exactly as the one core in 1.2; the 
 various caches are per-core  so is the info-bean-registry.
 What the patch does is replace the SolrCore singleton by a collection of 
 cores; all the code modifications are driven by the removal of the different 
 singletons (the config, the schema  the core).
 Each core is 'named' and a static map (keyed by name) allows to easily manage 
 them.
 You declare one servlet filter mapping per core you want to expose in the 
 web.xml; this allows easy to access each core through a different url. 
 USAGE (example web deployment, patch installed):
 Step0
 java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar solr.xml 
 monitor.ml
 Will index the 2 documents in solr.xml  monitor.xml
 Step1:
 http://localhost:8983/solr/core0/admin/stats.jsp
 Will produce the statistics page from the admin servlet on core0 index; 2 
 documents
 Step2:
 http://localhost:8983/solr/core1/admin/stats.jsp
 Will produce the statistics page from the admin servlet on core1 index; no 
 documents
 Step3:
 java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar ipod*.xml
 java -Durl='http://localhost:8983/solr/core1/update' -jar post.jar mon*.xml
 Adds the ipod*.xml to index of core0 and the mon*.xml to the index of core1;
 running queries from the admin interface, you can verify indexes have 
 different content. 
 USAGE (Java code):
 //create a configuration
 SolrConfig config = new SolrConfig(solrconfig.xml);
 //create a schema
 IndexSchema schema = new IndexSchema(config, schema0.xml);
 //create a core from the 2 other.
 SolrCore core = new SolrCore(core0, /path/to/index, config, schema);
 //Accessing a core:
 SolrCore core = SolrCore.getCore(core0); 
 PATCH MODIFICATIONS DETAILS (per package):
 org.apache.solr.core:
 The heaviest modifications are in SolrCore  SolrConfig.
 SolrCore is the most obvious modification; instead of a singleton, there is a 
 static map of cores keyed by names and assorted methods. To retain some 
 compatibility, the 'null' named core replaces the singleton for the relevant 
 methods, for instance SolrCore.getCore(). One small constraint on the core 
 name is they 

[jira] Updated: (SOLR-139) Support updateable/modifiable documents

2007-07-30 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-139:
--

Attachment: getStoredFields.patch

Attaching a patch for getStoredFields that appears to work.

 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Attachments: getStoredFields.patch, getStoredFields.patch, 
 getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-30 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516473
 ] 

Yonik Seeley commented on SOLR-139:
---

So the big issue now is that I don't think we can use getStoredFields() and do 
document modification outside the update handler.  The biggest reason is that I 
think we need to be able to update documents atomically (in the sense that 
updates should not be lost).

Consider the usecase of adding a new tag to a multi-valued field:  if two 
different clients tag a document at the same time, it doesn't seem acceptable 
that one of the tags could be lost.  So I think that we need a modifyDocument() 
call on updateHandler, and perhaps a ModifyUpdateCommand to go along with it.

I'm not sure yet what this means for request processors.  Perhaps another 
method that handles the reloaded storedFields?

 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Attachments: getStoredFields.patch, getStoredFields.patch, 
 getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: PHP Response Writer for Solr

2007-07-30 Thread Yonik Seeley
On 7/25/07, Pieter Berkel [EMAIL PROTECTED] wrote:
 I've been using the proposed PHP response writer code from SOLR-196
 (eval-able php code) and SOLR-275 (serialized php data) for some time now
 and would like to work towards getting these included in the main Solr
 distribution.

 http://www.nabble.com/Created%3A-%28SOLR-196%29-A-PHP-response-writer-for-Solr-tf3458434.html
 http://www.nabble.com/-jira--Created%3A-%28SOLR-275%29-PHP-Serialized-Response-Writer-tf3980951.html

 There is quite a bit of code duplication in SOLR-196 which I'd like to
 eliminate if possible, and due to the way php serializes data (e.g. storing
 the number of elements in an array) the JSONWriter may have to be refactored
 (specifically where arrays are written directly using writer.write('{') and
 writer.write('}') rather than writeArray() method.

I'd be for refactors that wouldn't slow down the current JSON writer,
or increase it's memory footprint.

As a specific example, if serializing something like Iterable, the
JSON writer should still be able to stream each element rather than
buffer them all in memory because the count is needed by a subclass.

 In order to differentiate between the two, I propose we rename the
 serialized writer to PHPSerializedResponseWriter to avoid any conflicts with
 the original eval PHPResponseWriter and configure them as such:

 queryResponseWriter name=php class=
 org.apache.solr.request.PHPResponseWriter/
 queryResponseWriter name=phps class=
 org.apache.solr.request.PHPSerializedResponseWriter/

sounds fine.

-Yonik


[jira] Updated: (SOLR-301) Clean up param interface. Leave deprecated options in deprecated classes

2007-07-30 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-301:
---

Attachment: SOLR-301-ParamCleanup.patch

moves SOLR-258 params into their own file.

This touches a lot of files, but isolates (most) deprecated features to 
deprecated classes.

 Clean up param interface.  Leave deprecated options in deprecated classes
 -

 Key: SOLR-301
 URL: https://issues.apache.org/jira/browse/SOLR-301
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-301-ParamCleanup.patch, SOLR-301-ParamCleanup.patch


 In SOLR-135, we moved the parameter handling stuff to a new package: 
 o.a.s.common.params and left @deprecated classes in the old location.
 Classes in the new package should not contain any deprecated options. 
 Aditionally, we should aim to seperate parameter manipulation logic 
 (DefaultSolrParams, AppendedSolrParams, etc) from 'parameter' interface 
 classes: 'HighlightParams', 'UpdateParams'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-30 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516598
 ] 

Ryan McKinley commented on SOLR-139:



 
 So I think that we need a modifyDocument() call on updateHandler, and perhaps 
 a ModifyUpdateCommand to go along with it.
 
 I'm not sure yet what this means for request processors.  Perhaps another 
 method that handles the reloaded storedFields?


Another option might be some sort of transaction or locking model.  Could it 
block other requests while there is an open transaction/lock?

Consider the case where we need the same atomic protection for fields loaded 
from non-stored fields loaded from a SQL database.  In this case, it may be 
nice to have locking/blocking happen at the processor level.

I don't know synchronized well enough to know if this works or is a bad idea, 
but what about something like:

class ModifyExistingDocumentProcessor {
  void processAdd(AddUpdateCommand cmd) {
String id = cmd.getIndexedId(schema);
synchronized( updateHandler ) {
   SolrDocument existing = updateHandler.loadStoredFields( id );
   myCustomHelper.loadTagsFromDB( existing );
   cmd.solrDoc = ModifyDocumentUtils.modifyDocument( ... );
   
   // This eventually calls: updateHandler.addDoc(cmd);
   super.processAdd(cmd);
}
  }
}

This type of approach would need to make sure everyone modifying fields was 
locking on the same updateHandler.

- - - -

I'm not against adding a ModifyUpdateCommand, I just like having the modify 
logic sit outside the UpdateHandler.

 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Attachments: getStoredFields.patch, getStoredFields.patch, 
 getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-258) Date based Facets

2007-07-30 Thread Greg Ludington
I started looking through this, and it looks very nice, though I do
see one slight nit to pick.  I may be reading this incorrectly, but
two parameters in rangeCount appear to be transposed.  In
SimpleFacets.java, the rangeCount method uses:

new ConstantScoreRangeQuery(field,low,high,iHigh,iLow)

but the Lucene javadocs suggest it is actually

new ConstantScoreRangeQuery(field,low,high,iLow,iHigh)

The iLow and iHigh parameters seem to be reversed.

Thanks,
Greg