[jira] Issue Comment Edited: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

2007-07-19 Thread Thomas Peuss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513817
 ] 

Thomas Peuss edited comment on SOLR-308 at 7/18/07 11:04 PM:
-

The use case is the following:
* We get catalog data from vendors (300+). We have no control about the data.
* The only unique thing is the catalogid, which is of course the same for all 
rows in one catalog.
* In our webapp we request first only a few fields that are needed for the 
search result display.
* When the customer clicks on a product in the search result he gets a detailed 
page. To get the info from Solr we need a unique id to read the rest of the 
fields (50+). This id is generated by this code.

Of course we could add the unique id in a preprocessing step but we wanted to 
achieve this with Solr alone.

The update procedure goes like this:
* Delete all documents with a specific catalogId
* Insert the updated catalog data

So you see we need this id to find the exact same document we have in the 
search result. We do nothing more with it.

Maybe I overlooked something and this can be achieved with existing code. Any 
hint is welcome.


 was:
The use case is the following:
* We get catalog data from vendors.
* The only unique thing is the catalogid, which is of course the same for all 
rows in one catalog.
* In our webapp we request first only a few fields that are needed for the 
search result display.
* When the customer clicks on a product in the search result he gets a detailed 
page. To get the info from Solr we need a unique id to read the rest of the 
fields (50+). This id is generated by this code.

So you see we need this id only for reference. We do nothing more with it.

Maybe I overlooked something and this can be achieved with existing code. Any 
hint is welcome.

 Add a field that generates an unique id when you have none in your data to 
 index
 

 Key: SOLR-308
 URL: https://issues.apache.org/jira/browse/SOLR-308
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Thomas Peuss
Priority: Minor
 Attachments: GeneratedId.patch


 This patch adds a field that generates an unique id when you have no unique 
 id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-215) Multiple Solr Cores

2007-07-19 Thread Henri Biestro (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513829
 ] 

Henri Biestro commented on SOLR-215:


On Otis's comments:
1  2- static initializers for lock related value: you are correct, the code 
has been lost most likely in some merge- my bad.
3- SolrInfoRegistry  deprecated: you are correct, functionality is replaced by 
SolrCore.getSolrCore().getInfoRegistry().
4-classLoader not assigned: not sure why it happens but this fixes it...
5- checkName is not subtle: I had the idea of normalizing the core name (url 
like normalize for instance) but did not pursue since it might make the 
replication scripts more complex to modify (aka the normalization code would 
need to be duplicated in the script). And since the solaris scripts were not 
completely functional (my dev machine being solaris), I've postponed the 
task... ( I also was dreaming about being able to derive from SorlCore to 
benefit from the static map, implement a naming policy that would encompass the 
config  schema name generations, etc...). Anyhow, this can indeed be 
simplified with a regexp match.
6-finalize(): no, I believe finalizing one core should just ensure that this 
core is shutdown.This is only for completeness though since I cant see how a 
core could be gc-ed  finalized before it actually gets shutdown  removed from 
the map of cores.

On Ryan's comments:
1- factory/init interface compatibility break: I'll look into other ways since 
if this is a blocker (ctor, setter or wrap/delegate...). 
2- RequestHandlers know core: SolrUpdateServlet is deprecated but is still 
there; I was just trying to ensure correct/compatible behavior. I agree 
SolrInit is more clutter than necessity but can be dropped easily if there is 
no need to support the SolrUpdateServlet.
3- I do agree that there must be an easier  more functional way to declare and 
access a core than the current one. I'll try the route you describe.
4- Having core descriptors (config/schema) as explicit files in a 
$solrhome/cores directory; might use some naming convention to derive the core 
name from them (related to uploading/dynamic creation of cores).

I'm mostly off the grid today but I'll try my best on Friday.


 Multiple Solr Cores
 ---

 Key: SOLR-215
 URL: https://issues.apache.org/jira/browse/SOLR-215
 Project: Solr
  Issue Type: Improvement
Reporter: Henri Biestro
Priority: Minor
 Attachments: solr-215.patch, solr-215.patch, solr-215.patch, 
 solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
 solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
 solr-trunk-533775.patch, solr-trunk-538091.patch, solr-trunk-542847-1.patch, 
 solr-trunk-542847.patch, solr-trunk-src.patch


 WHAT:
 As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index.
 This patch is intended to allow multiple cores in Solr which also brings 
 multiple indexes capability.
 The patch file to grab is solr-215.patch.zip (see MISC session below).
 WHY:
 The current Solr practical wisdom is that one schema - thus one index - is 
 most likely to accomodate your indexing needs, using a filter to segregate 
 documents if needed. If you really need multiple indexes, deploy multiple web 
 applications.
 There are a some use cases however where having multiple indexes or multiple 
 cores through Solr itself may make sense.
 Multiple cores:
 Deployment issues within some organizations where IT will resist deploying 
 multiple web applications.
 Seamless schema update where you can create a new core and switch to it 
 without starting/stopping servers.
 Embedding Solr in your own application (instead of 'raw' Lucene) and 
 functionally need to segregate schemas  collections.
 Multiple indexes:
 Multiple language collections where each document exists in different 
 languages, analysis being language dependant.
 Having document types that have nothing (or very little) in common with 
 respect to their schema, their lifetime/update frequencies or even collection 
 sizes.
 HOW:
 The best analogy is to consider that instead of deploying multiple 
 web-application, you can have one web-application that hosts more than one 
 Solr core. The patch does not change any of the core logic (nor the core 
 code); each core is configured  behaves exactly as the one core in 1.2; the 
 various caches are per-core  so is the info-bean-registry.
 What the patch does is replace the SolrCore singleton by a collection of 
 cores; all the code modifications are driven by the removal of the different 
 singletons (the config, the schema  the core).
 Each core is 'named' and a static map (keyed by name) allows to easily manage 
 them.
 You declare one servlet filter mapping per core you want to expose in the 
 web.xml; this allows easy to access each core through a different 

Hudson build is back to normal: Solr-Nightly #147

2007-07-19 Thread hudson
See http://lucene.zones.apache.org:8080/hudson/job/Solr-Nightly/147/changes




[jira] Resolved: (SOLR-307) NGramFilterFactory and EdgeNGramFilterFactory

2007-07-19 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-307.
---

Resolution: Fixed

Thanks Thomas, this is committed.


 NGramFilterFactory and EdgeNGramFilterFactory
 -

 Key: SOLR-307
 URL: https://issues.apache.org/jira/browse/SOLR-307
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Thomas Peuss
Priority: Minor
 Attachments: SolrNGramFilters.patch


 Here is a patch that adds an NGramFilterFactory and EdgeNGramFilterFactory to 
 Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

2007-07-19 Thread Pieter Berkel (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513891
 ] 

Pieter Berkel commented on SOLR-308:


From the usage case you have provided, it sounds like the unique id will 
change every time you delete and re-insert the document.  If this is the case, 
then perhaps it might be more efficient to use the lucene document id as your 
unique id value rather than a seperate field?  However, as far as I'm aware, 
there currently isn't any way to access the lucene doc id from solr (except 
perhaps the luke request handler)?


 Add a field that generates an unique id when you have none in your data to 
 index
 

 Key: SOLR-308
 URL: https://issues.apache.org/jira/browse/SOLR-308
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Thomas Peuss
Priority: Minor
 Attachments: GeneratedId.patch


 This patch adds a field that generates an unique id when you have no unique 
 id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-312) create solrj javadoc in build.xml

2007-07-19 Thread Will Johnson (JIRA)
create solrj javadoc in build.xml
-

 Key: SOLR-312
 URL: https://issues.apache.org/jira/browse/SOLR-312
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.3
 Environment: a new task in build.xml named javadoc-solrj that does 
pretty much what you'd expect.  creates a new fold build/docs/api-solrj.  
heavily based on the example from the solr core javadoc target.
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.3




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-312) create solrj javadoc in build.xml

2007-07-19 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-312:
--

Attachment: create-solrj-javadoc.patch

simple patch to add new task

 create solrj javadoc in build.xml
 -

 Key: SOLR-312
 URL: https://issues.apache.org/jira/browse/SOLR-312
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.3
 Environment: a new task in build.xml named javadoc-solrj that does 
 pretty much what you'd expect.  creates a new fold build/docs/api-solrj.  
 heavily based on the example from the solr core javadoc target.
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.3

 Attachments: create-solrj-javadoc.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-215) Multiple Solr Cores

2007-07-19 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513896
 ] 

Otis Gospodnetic commented on SOLR-215:
---

I didn't even realize this patch would still require cores to be declared 
apriori in static files such as web.xml. 

I think this new multi-core functionality should come with the core manager 
handler, as we said here:
https://issues.apache.org/jira/browse/SOLR-215#action_12506920
https://issues.apache.org/jira/browse/SOLR-215#action_12507189

So, something like:
/admin/coremanager?cmd=addname=fooschema=foo-schema.xmlconfig=foo-solrconfig.xml
(this assumes that foo-schema.xml and foo-solrconfig.xml already exist in conf/ 
dir)

One could also POST this and *include* the *content* of the 2 .xml files.  In 
that case the core manager would be the one writing their content to disk in 
conf/ dir prior to starting the given core.

My suggestion is that this be added in phase 2, after Henri's initial changes 
are committed.
Does this sound reasonable?


 Multiple Solr Cores
 ---

 Key: SOLR-215
 URL: https://issues.apache.org/jira/browse/SOLR-215
 Project: Solr
  Issue Type: Improvement
Reporter: Henri Biestro
Priority: Minor
 Attachments: solr-215.patch, solr-215.patch, solr-215.patch, 
 solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
 solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
 solr-trunk-533775.patch, solr-trunk-538091.patch, solr-trunk-542847-1.patch, 
 solr-trunk-542847.patch, solr-trunk-src.patch


 WHAT:
 As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index.
 This patch is intended to allow multiple cores in Solr which also brings 
 multiple indexes capability.
 The patch file to grab is solr-215.patch.zip (see MISC session below).
 WHY:
 The current Solr practical wisdom is that one schema - thus one index - is 
 most likely to accomodate your indexing needs, using a filter to segregate 
 documents if needed. If you really need multiple indexes, deploy multiple web 
 applications.
 There are a some use cases however where having multiple indexes or multiple 
 cores through Solr itself may make sense.
 Multiple cores:
 Deployment issues within some organizations where IT will resist deploying 
 multiple web applications.
 Seamless schema update where you can create a new core and switch to it 
 without starting/stopping servers.
 Embedding Solr in your own application (instead of 'raw' Lucene) and 
 functionally need to segregate schemas  collections.
 Multiple indexes:
 Multiple language collections where each document exists in different 
 languages, analysis being language dependant.
 Having document types that have nothing (or very little) in common with 
 respect to their schema, their lifetime/update frequencies or even collection 
 sizes.
 HOW:
 The best analogy is to consider that instead of deploying multiple 
 web-application, you can have one web-application that hosts more than one 
 Solr core. The patch does not change any of the core logic (nor the core 
 code); each core is configured  behaves exactly as the one core in 1.2; the 
 various caches are per-core  so is the info-bean-registry.
 What the patch does is replace the SolrCore singleton by a collection of 
 cores; all the code modifications are driven by the removal of the different 
 singletons (the config, the schema  the core).
 Each core is 'named' and a static map (keyed by name) allows to easily manage 
 them.
 You declare one servlet filter mapping per core you want to expose in the 
 web.xml; this allows easy to access each core through a different url. 
 USAGE (example web deployment, patch installed):
 Step0
 java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar solr.xml 
 monitor.ml
 Will index the 2 documents in solr.xml  monitor.xml
 Step1:
 http://localhost:8983/solr/core0/admin/stats.jsp
 Will produce the statistics page from the admin servlet on core0 index; 2 
 documents
 Step2:
 http://localhost:8983/solr/core1/admin/stats.jsp
 Will produce the statistics page from the admin servlet on core1 index; no 
 documents
 Step3:
 java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar ipod*.xml
 java -Durl='http://localhost:8983/solr/core1/update' -jar post.jar mon*.xml
 Adds the ipod*.xml to index of core0 and the mon*.xml to the index of core1;
 running queries from the admin interface, you can verify indexes have 
 different content. 
 USAGE (Java code):
 //create a configuration
 SolrConfig config = new SolrConfig(solrconfig.xml);
 //create a schema
 IndexSchema schema = new IndexSchema(config, schema0.xml);
 //create a core from the 2 other.
 SolrCore core = new SolrCore(core0, /path/to/index, config, schema);
 //Accessing a core:
 SolrCore core = SolrCore.getCore(core0); 
 PATCH MODIFICATIONS DETAILS (per package):
 

[jira] Commented: (SOLR-215) Multiple Solr Cores

2007-07-19 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513912
 ] 

Will Johnson commented on SOLR-215:
---

did anything ever get baked into the patch for handling the core name as a cgi 
param instead of as a url path element?  the email thread we had going didn't 
seem to come to any hard conclusions but i'd like to lobby for it as a part of 
the spec.  i read through the patch but i couldn't quite follow things enough 
to tell.

 Multiple Solr Cores
 ---

 Key: SOLR-215
 URL: https://issues.apache.org/jira/browse/SOLR-215
 Project: Solr
  Issue Type: Improvement
Reporter: Henri Biestro
Priority: Minor
 Attachments: solr-215.patch, solr-215.patch, solr-215.patch, 
 solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
 solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
 solr-trunk-533775.patch, solr-trunk-538091.patch, solr-trunk-542847-1.patch, 
 solr-trunk-542847.patch, solr-trunk-src.patch


 WHAT:
 As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index.
 This patch is intended to allow multiple cores in Solr which also brings 
 multiple indexes capability.
 The patch file to grab is solr-215.patch.zip (see MISC session below).
 WHY:
 The current Solr practical wisdom is that one schema - thus one index - is 
 most likely to accomodate your indexing needs, using a filter to segregate 
 documents if needed. If you really need multiple indexes, deploy multiple web 
 applications.
 There are a some use cases however where having multiple indexes or multiple 
 cores through Solr itself may make sense.
 Multiple cores:
 Deployment issues within some organizations where IT will resist deploying 
 multiple web applications.
 Seamless schema update where you can create a new core and switch to it 
 without starting/stopping servers.
 Embedding Solr in your own application (instead of 'raw' Lucene) and 
 functionally need to segregate schemas  collections.
 Multiple indexes:
 Multiple language collections where each document exists in different 
 languages, analysis being language dependant.
 Having document types that have nothing (or very little) in common with 
 respect to their schema, their lifetime/update frequencies or even collection 
 sizes.
 HOW:
 The best analogy is to consider that instead of deploying multiple 
 web-application, you can have one web-application that hosts more than one 
 Solr core. The patch does not change any of the core logic (nor the core 
 code); each core is configured  behaves exactly as the one core in 1.2; the 
 various caches are per-core  so is the info-bean-registry.
 What the patch does is replace the SolrCore singleton by a collection of 
 cores; all the code modifications are driven by the removal of the different 
 singletons (the config, the schema  the core).
 Each core is 'named' and a static map (keyed by name) allows to easily manage 
 them.
 You declare one servlet filter mapping per core you want to expose in the 
 web.xml; this allows easy to access each core through a different url. 
 USAGE (example web deployment, patch installed):
 Step0
 java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar solr.xml 
 monitor.ml
 Will index the 2 documents in solr.xml  monitor.xml
 Step1:
 http://localhost:8983/solr/core0/admin/stats.jsp
 Will produce the statistics page from the admin servlet on core0 index; 2 
 documents
 Step2:
 http://localhost:8983/solr/core1/admin/stats.jsp
 Will produce the statistics page from the admin servlet on core1 index; no 
 documents
 Step3:
 java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar ipod*.xml
 java -Durl='http://localhost:8983/solr/core1/update' -jar post.jar mon*.xml
 Adds the ipod*.xml to index of core0 and the mon*.xml to the index of core1;
 running queries from the admin interface, you can verify indexes have 
 different content. 
 USAGE (Java code):
 //create a configuration
 SolrConfig config = new SolrConfig(solrconfig.xml);
 //create a schema
 IndexSchema schema = new IndexSchema(config, schema0.xml);
 //create a core from the 2 other.
 SolrCore core = new SolrCore(core0, /path/to/index, config, schema);
 //Accessing a core:
 SolrCore core = SolrCore.getCore(core0); 
 PATCH MODIFICATIONS DETAILS (per package):
 org.apache.solr.core:
 The heaviest modifications are in SolrCore  SolrConfig.
 SolrCore is the most obvious modification; instead of a singleton, there is a 
 static map of cores keyed by names and assorted methods. To retain some 
 compatibility, the 'null' named core replaces the singleton for the relevant 
 methods, for instance SolrCore.getCore(). One small constraint on the core 
 name is they can't contain '/' or '\' avoiding potential url  file path 
 problems.
 SolrConfig ( SolrIndexConfig) are now used to persist all 

[jira] Commented: (SOLR-215) Multiple Solr Cores

2007-07-19 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513946
 ] 

Ryan McKinley commented on SOLR-215:


 
 My suggestion is that this be added in phase 2, after Henri's initial changes 
 are committed.
 Does this sound reasonable?
 

Yes - perhaps getting this checked in without touching handlers or the web-app 
side is a good idea.  It is a little weird since the multi-core aspect would 
only be usable programatically, but that will make it possible to easily bat 
around a 'core manager' and http design.

The one big question is what to do with the TokenizerFactory API.  

Yonik, how do you suggest upgrading an interface?  The only clean way I can 
think is to upgrade the TokenizerFactory interface with a 
'MulitCoreTokenizerFactory'  adding an additional argument.  I don't like it, 
but don't know the API compatibility rules well enough to know if it is 
required or is ok to change the API.



Will - as is, this patch does not let you dynamically change the core.  They 
are statically defined in web.xml.  This will change.

 Multiple Solr Cores
 ---

 Key: SOLR-215
 URL: https://issues.apache.org/jira/browse/SOLR-215
 Project: Solr
  Issue Type: Improvement
Reporter: Henri Biestro
Priority: Minor
 Attachments: solr-215.patch, solr-215.patch, solr-215.patch, 
 solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
 solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
 solr-trunk-533775.patch, solr-trunk-538091.patch, solr-trunk-542847-1.patch, 
 solr-trunk-542847.patch, solr-trunk-src.patch


 WHAT:
 As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index.
 This patch is intended to allow multiple cores in Solr which also brings 
 multiple indexes capability.
 The patch file to grab is solr-215.patch.zip (see MISC session below).
 WHY:
 The current Solr practical wisdom is that one schema - thus one index - is 
 most likely to accomodate your indexing needs, using a filter to segregate 
 documents if needed. If you really need multiple indexes, deploy multiple web 
 applications.
 There are a some use cases however where having multiple indexes or multiple 
 cores through Solr itself may make sense.
 Multiple cores:
 Deployment issues within some organizations where IT will resist deploying 
 multiple web applications.
 Seamless schema update where you can create a new core and switch to it 
 without starting/stopping servers.
 Embedding Solr in your own application (instead of 'raw' Lucene) and 
 functionally need to segregate schemas  collections.
 Multiple indexes:
 Multiple language collections where each document exists in different 
 languages, analysis being language dependant.
 Having document types that have nothing (or very little) in common with 
 respect to their schema, their lifetime/update frequencies or even collection 
 sizes.
 HOW:
 The best analogy is to consider that instead of deploying multiple 
 web-application, you can have one web-application that hosts more than one 
 Solr core. The patch does not change any of the core logic (nor the core 
 code); each core is configured  behaves exactly as the one core in 1.2; the 
 various caches are per-core  so is the info-bean-registry.
 What the patch does is replace the SolrCore singleton by a collection of 
 cores; all the code modifications are driven by the removal of the different 
 singletons (the config, the schema  the core).
 Each core is 'named' and a static map (keyed by name) allows to easily manage 
 them.
 You declare one servlet filter mapping per core you want to expose in the 
 web.xml; this allows easy to access each core through a different url. 
 USAGE (example web deployment, patch installed):
 Step0
 java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar solr.xml 
 monitor.ml
 Will index the 2 documents in solr.xml  monitor.xml
 Step1:
 http://localhost:8983/solr/core0/admin/stats.jsp
 Will produce the statistics page from the admin servlet on core0 index; 2 
 documents
 Step2:
 http://localhost:8983/solr/core1/admin/stats.jsp
 Will produce the statistics page from the admin servlet on core1 index; no 
 documents
 Step3:
 java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar ipod*.xml
 java -Durl='http://localhost:8983/solr/core1/update' -jar post.jar mon*.xml
 Adds the ipod*.xml to index of core0 and the mon*.xml to the index of core1;
 running queries from the admin interface, you can verify indexes have 
 different content. 
 USAGE (Java code):
 //create a configuration
 SolrConfig config = new SolrConfig(solrconfig.xml);
 //create a schema
 IndexSchema schema = new IndexSchema(config, schema0.xml);
 //create a core from the 2 other.
 SolrCore core = new SolrCore(core0, /path/to/index, config, schema);
 //Accessing a core:
 SolrCore core 

[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

2007-07-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513959
 ] 

Yonik Seeley commented on SOLR-308:
---

Lucene docids are transient (they change when the index changes) - they should 
not be used across different instances of an IndexReader

 Add a field that generates an unique id when you have none in your data to 
 index
 

 Key: SOLR-308
 URL: https://issues.apache.org/jira/browse/SOLR-308
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Thomas Peuss
Priority: Minor
 Attachments: GeneratedId.patch


 This patch adds a field that generates an unique id when you have no unique 
 id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

2007-07-19 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513960
 ] 

Ryan McKinley commented on SOLR-308:


The easiest option is to add a UUID when you index the data.  

Other options would be to make this FieldType a plugin and put it in the 'lib' 
directory.

 Add a field that generates an unique id when you have none in your data to 
 index
 

 Key: SOLR-308
 URL: https://issues.apache.org/jira/browse/SOLR-308
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Thomas Peuss
Priority: Minor
 Attachments: GeneratedId.patch


 This patch adds a field that generates an unique id when you have no unique 
 id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

2007-07-19 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513972
 ] 

Hoss Man commented on SOLR-308:
---

I understood your data entry/delete reindexing strategy, but i hadn't 
considered the use case of doing a query, and then issuing a followup query to 
get more details about specific items.

As yonik points out, exposing the internal lucene docid would be a bad idea 
since it may change every time an IndexReader is opened ... even if hte doc you 
are interested in is still in the index (ie: hasn't been deleted) other 
deletions may have changed it's internal id.

i have no objection to adding a FieldType that can generate UUID on demand for 
use cases like this, but having it ignore the input seems a little sketchy to 
me.  it seems like a better approach would be to have UUIDFieldType with a 
toInternal() method that tests it's input for some marker token (like NEW or 
*) and if it sees that token, generates a new UUID, otherwise it uses the 
literal value.  then you can configure the id field with a defaultValue of 
NEW in the schema and any doc without an id will get a unique one, but if 
someone tries to update an existing doc whose id they already know, it will 
still work as well.

 Add a field that generates an unique id when you have none in your data to 
 index
 

 Key: SOLR-308
 URL: https://issues.apache.org/jira/browse/SOLR-308
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Thomas Peuss
Priority: Minor
 Attachments: GeneratedId.patch


 This patch adds a field that generates an unique id when you have no unique 
 id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-311) wrong quoting in tutorial - fails on windows

2007-07-19 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-311.
---

Resolution: Fixed
  Assignee: Hoss Man


thanks for pointing this out.

FYI: the html and pdf versions are generated from the xml by forrest.

Committed revision 557739.
website sync should be ~30 minutes

 wrong quoting in tutorial - fails on windows
 

 Key: SOLR-311
 URL: https://issues.apache.org/jira/browse/SOLR-311
 Project: Solr
  Issue Type: Bug
  Components: documentation
 Environment: Windows XP and likely other windows variants
Reporter: Paul Sundling
Assignee: Hoss Man
Priority: Trivial

 java -Ddata=args -Dcommit=no -jar post.jar 'deleteidSP2514N/id/delete'
 and
 java -Ddata=args -jar post.jar 'deletequeryname:DDR/query/delete'
 should have their single quotes replaced with double quotes.  Otherwise, it 
 results in the following error on windows command line:
 (sample DOS window FAILS)
 C:\downloads\temp\apache-solr-1.2.0\example\exampledocsjava -Ddata=args -jar 
 post.jar 'deletequeryname:DDR/query/delete'
  was unexpected at this time.
 (sample DOS window WORKS)
 C:\downloads\temp\apache-solr-1.2.0\example\exampledocsjava -Ddata=args -jar 
 post.jar deletequeryname:DDR/query/delete
 SimplePostTool: version 1.2
 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, 
 other encodings are not currently supported
 SimplePostTool: POSTing args to http://localhost:8983/solr/update..
 SimplePostTool: COMMITting Solr index changes..
 As demonstrated double quotes works with windows.  I also tested double 
 quotes in cygwin, and it should presumably work for linux/UNIX as well.
 I started to do a patch, but I see there are three locations where updates 
 might need to be made and I wasn't sure how PDF files were generated, so 
 here's the list of effected source files:
 site/tutorial.html
 site/tutorial.pdf
 src/site/src/documentation/content/xdocs/tutorial.xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-312) create solrj javadoc in build.xml

2007-07-19 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley resolved SOLR-312.


Resolution: Fixed

added in rev 557774

thanks Will

 create solrj javadoc in build.xml
 -

 Key: SOLR-312
 URL: https://issues.apache.org/jira/browse/SOLR-312
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.3
 Environment: a new task in build.xml named javadoc-solrj that does 
 pretty much what you'd expect.  creates a new fold build/docs/api-solrj.  
 heavily based on the example from the solr core javadoc target.
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.3

 Attachments: create-solrj-javadoc.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-312) create solrj javadoc in build.xml

2007-07-19 Thread Paul Sundling (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514003
 ] 

Paul Sundling commented on SOLR-312:


Maven already has javadoc target included automatically.  

 create solrj javadoc in build.xml
 -

 Key: SOLR-312
 URL: https://issues.apache.org/jira/browse/SOLR-312
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.3
 Environment: a new task in build.xml named javadoc-solrj that does 
 pretty much what you'd expect.  creates a new fold build/docs/api-solrj.  
 heavily based on the example from the solr core javadoc target.
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.3

 Attachments: create-solrj-javadoc.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



dismax catenated token search

2007-07-19 Thread Yonik Seeley

Does anyone have a good idea how to go about searching for concatenated tokens?

Say that the index has painkiller and the user types in
pain killer (without the quotes).

If one were using the standard request handler, the easiest would be
to have the client handle it by sending in both variants:
pain OR killer OR painkiller
 or a variant like
pain killer OR painkiller

But is there any answer when using dismax?
Requiring the client to send in pain killer painkiller seems like it
may decrease relevance too much if you currently use pf (phrase
fields) since the phrase pain killer painkiller isn't going to match
anything.

Thoughts?

-Yonik


[jira] Updated: (SOLR-258) Date based Facets

2007-07-19 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-258:
--

Attachment: date_facets.patch

checkpoint...

* renamed pre/post/inner to before/after/between
* added a new facet.date.hardend param (with test additions)

...still need to tackle the NOW inconsistency issue.

 Date based Facets
 -

 Key: SOLR-258
 URL: https://issues.apache.org/jira/browse/SOLR-258
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: date_facets.patch, date_facets.patch, date_facets.patch, 
 date_facets.patch, date_facets.patch, date_facets.patch


 1) Allow clients to express concepts like...
 * give me facet counts per day for every day this month.
 * give me facet counts per hour for every hour of today.
 * give me facet counts per hour for every hour of a specific day.
 * give me facet counts per hour for every hour of a specific day and 
 give me facet counts for the 
number of matches before that day, or after that day. 
 2) Return all data in a way that makes it easy to use to build filter queries 
 on those date ranges.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: dismax catenated token search

2007-07-19 Thread Mike Klaas


On 19-Jul-07, at 2:49 PM, Yonik Seeley wrote:

Does anyone have a good idea how to go about searching for  
concatenated tokens?


Say that the index has painkiller and the user types in
pain killer (without the quotes).

If one were using the standard request handler, the easiest would be
to have the client handle it by sending in both variants:
pain OR killer OR painkiller
 or a variant like
pain killer OR painkiller

But is there any answer when using dismax?
Requiring the client to send in pain killer painkiller seems like it
may decrease relevance too much if you currently use pf (phrase
fields) since the phrase pain killer painkiller isn't going to match
anything.

Thoughts?


Yes, pf should be replaced by a word proximity query that doesn't  
require all words to match :)


-Mike


Re: dismax catenated token search

2007-07-19 Thread Yonik Seeley

On 7/19/07, Mike Klaas [EMAIL PROTECTED] wrote:


On 19-Jul-07, at 2:49 PM, Yonik Seeley wrote:

 Does anyone have a good idea how to go about searching for
 concatenated tokens?

 Say that the index has painkiller and the user types in
 pain killer (without the quotes).

 If one were using the standard request handler, the easiest would be
 to have the client handle it by sending in both variants:
 pain OR killer OR painkiller
  or a variant like
 pain killer OR painkiller

 But is there any answer when using dismax?
 Requiring the client to send in pain killer painkiller seems like it
 may decrease relevance too much if you currently use pf (phrase
 fields) since the phrase pain killer painkiller isn't going to match
 anything.

 Thoughts?

Yes, pf should be replaced by a word proximity query that doesn't
require all words to match :)


Some other quick ideas:
1) client issues two separate queries... pain killer and
painkiller and merges
  results.
2) dismax parameter that throws word catenations into the MaxDisjunction:
  a b c would also search for ab and bc.

-Yonik


[jira] Updated: (SOLR-258) Date based Facets

2007-07-19 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-258:
--

Attachment: date_facets.patch

fixed the the NOW issue by refactoring the toExternal(toInternal()) logic into 
a new DateField.parseMath(Date,String) method ... a DateMathParser is still 
used internally to deal with teh math parsing aspects, but i wanted to leave 
the assumptions about the date format in the DateField class itself.

comments/critique about this approach welcome.

 Date based Facets
 -

 Key: SOLR-258
 URL: https://issues.apache.org/jira/browse/SOLR-258
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: date_facets.patch, date_facets.patch, date_facets.patch, 
 date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch


 1) Allow clients to express concepts like...
 * give me facet counts per day for every day this month.
 * give me facet counts per hour for every hour of today.
 * give me facet counts per hour for every hour of a specific day.
 * give me facet counts per hour for every hour of a specific day and 
 give me facet counts for the 
number of matches before that day, or after that day. 
 2) Return all data in a way that makes it easy to use to build filter queries 
 on those date ranges.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: dismax catenated token search

2007-07-19 Thread Chris Hostetter

:  Yes, pf should be replaced by a word proximity query that doesn't
:  require all words to match :)

: 2) dismax parameter that throws word catenations into the MaxDisjunction:
:a b c would also search for ab and bc.

that doesn't address the inverse problem: when pain killer is indexed
but the user searches for painkiller

I believe both problems can be solved by using the NgramTokenizer on a
field in the qf ... but i have not tested this.  (i'm not entreily certain
what the NgramTokenizer does with whitespaces, so it might actually need
to KeywordTokenizer followed by a Filter that strips out interword
whitespace, followed by NgramTokenFilter ... or something like that.


-Hoss



[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-19 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514099
 ] 

Mike Klaas commented on SOLR-139:
-

It is my fault that the DUH2 locking is so hairy to begin with, so I should at 
least review changes to it ;)

With your last change, the locking looks sound.  However, I noticed a few 
things:

This comment is now inaccurate:
+// need to start off with the write lock because we can't aquire
+// the write lock if we need to.

Should openSearcher() call closeSearcher() instead of doing it manually?  It 
looks like searcherHasChanges is not being reset to false.



 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Attachments: getStoredFields.patch, getStoredFields.patch, 
 getStoredFields.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-305) ad support to analysis tool for working with type names instead of just field names

2007-07-19 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-305.
---

Resolution: Fixed
  Assignee: Hoss Man

Committed revision 557870.


 ad support to analysis tool for working with type names instead of just field 
 names
 ---

 Key: SOLR-305
 URL: https://issues.apache.org/jira/browse/SOLR-305
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
Assignee: Hoss Man
Priority: Trivial
 Attachments: analysistool.bytype.diff


 quick little patch to analysis.jsp so people can choose between specifying a 
 field name or a fieldtype name ... may save time when you want to try out a 
 bunch of different analyzer options because you only have to create the field 
 types - not fields that use them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-102) Ideas for better highlighting

2007-07-19 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas resolved SOLR-102.
-

Resolution: Fixed

committed in r557872

 Ideas for better highlighting
 -

 Key: SOLR-102
 URL: https://issues.apache.org/jira/browse/SOLR-102
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Mike Klaas
Assignee: Mike Klaas
Priority: Minor
 Attachments: regexfrag.patch, RegexFragmenter.java


 A collection of rough enhancements to the default highlighter. Mostly to be 
 used as ideas for future development.
 RegexFragmenter - Define a regular expression to indicate points of 
 interest inthe target text (eg., beginning/end of sentences).  Fragmenter 
 will attempt to start/end fragments at these locations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.