[jira] Issue Comment Edited: (SOLR-308) Add a field that generates an unique id when you have none in your data to index
[ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513817 ] Thomas Peuss edited comment on SOLR-308 at 7/18/07 11:04 PM: - The use case is the following: * We get catalog data from vendors (300+). We have no control about the data. * The only unique thing is the catalogid, which is of course the same for all rows in one catalog. * In our webapp we request first only a few fields that are needed for the search result display. * When the customer clicks on a product in the search result he gets a detailed page. To get the info from Solr we need a unique id to read the rest of the fields (50+). This id is generated by this code. Of course we could add the unique id in a preprocessing step but we wanted to achieve this with Solr alone. The update procedure goes like this: * Delete all documents with a specific catalogId * Insert the updated catalog data So you see we need this id to find the exact same document we have in the search result. We do nothing more with it. Maybe I overlooked something and this can be achieved with existing code. Any hint is welcome. was: The use case is the following: * We get catalog data from vendors. * The only unique thing is the catalogid, which is of course the same for all rows in one catalog. * In our webapp we request first only a few fields that are needed for the search result display. * When the customer clicks on a product in the search result he gets a detailed page. To get the info from Solr we need a unique id to read the rest of the fields (50+). This id is generated by this code. So you see we need this id only for reference. We do nothing more with it. Maybe I overlooked something and this can be achieved with existing code. Any hint is welcome. Add a field that generates an unique id when you have none in your data to index Key: SOLR-308 URL: https://issues.apache.org/jira/browse/SOLR-308 Project: Solr Issue Type: New Feature Components: search Reporter: Thomas Peuss Priority: Minor Attachments: GeneratedId.patch This patch adds a field that generates an unique id when you have no unique id in your data you want to index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-215) Multiple Solr Cores
[ https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513829 ] Henri Biestro commented on SOLR-215: On Otis's comments: 1 2- static initializers for lock related value: you are correct, the code has been lost most likely in some merge- my bad. 3- SolrInfoRegistry deprecated: you are correct, functionality is replaced by SolrCore.getSolrCore().getInfoRegistry(). 4-classLoader not assigned: not sure why it happens but this fixes it... 5- checkName is not subtle: I had the idea of normalizing the core name (url like normalize for instance) but did not pursue since it might make the replication scripts more complex to modify (aka the normalization code would need to be duplicated in the script). And since the solaris scripts were not completely functional (my dev machine being solaris), I've postponed the task... ( I also was dreaming about being able to derive from SorlCore to benefit from the static map, implement a naming policy that would encompass the config schema name generations, etc...). Anyhow, this can indeed be simplified with a regexp match. 6-finalize(): no, I believe finalizing one core should just ensure that this core is shutdown.This is only for completeness though since I cant see how a core could be gc-ed finalized before it actually gets shutdown removed from the map of cores. On Ryan's comments: 1- factory/init interface compatibility break: I'll look into other ways since if this is a blocker (ctor, setter or wrap/delegate...). 2- RequestHandlers know core: SolrUpdateServlet is deprecated but is still there; I was just trying to ensure correct/compatible behavior. I agree SolrInit is more clutter than necessity but can be dropped easily if there is no need to support the SolrUpdateServlet. 3- I do agree that there must be an easier more functional way to declare and access a core than the current one. I'll try the route you describe. 4- Having core descriptors (config/schema) as explicit files in a $solrhome/cores directory; might use some naming convention to derive the core name from them (related to uploading/dynamic creation of cores). I'm mostly off the grid today but I'll try my best on Friday. Multiple Solr Cores --- Key: SOLR-215 URL: https://issues.apache.org/jira/browse/SOLR-215 Project: Solr Issue Type: Improvement Reporter: Henri Biestro Priority: Minor Attachments: solr-215.patch, solr-215.patch, solr-215.patch, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, solr-trunk-533775.patch, solr-trunk-538091.patch, solr-trunk-542847-1.patch, solr-trunk-542847.patch, solr-trunk-src.patch WHAT: As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index. This patch is intended to allow multiple cores in Solr which also brings multiple indexes capability. The patch file to grab is solr-215.patch.zip (see MISC session below). WHY: The current Solr practical wisdom is that one schema - thus one index - is most likely to accomodate your indexing needs, using a filter to segregate documents if needed. If you really need multiple indexes, deploy multiple web applications. There are a some use cases however where having multiple indexes or multiple cores through Solr itself may make sense. Multiple cores: Deployment issues within some organizations where IT will resist deploying multiple web applications. Seamless schema update where you can create a new core and switch to it without starting/stopping servers. Embedding Solr in your own application (instead of 'raw' Lucene) and functionally need to segregate schemas collections. Multiple indexes: Multiple language collections where each document exists in different languages, analysis being language dependant. Having document types that have nothing (or very little) in common with respect to their schema, their lifetime/update frequencies or even collection sizes. HOW: The best analogy is to consider that instead of deploying multiple web-application, you can have one web-application that hosts more than one Solr core. The patch does not change any of the core logic (nor the core code); each core is configured behaves exactly as the one core in 1.2; the various caches are per-core so is the info-bean-registry. What the patch does is replace the SolrCore singleton by a collection of cores; all the code modifications are driven by the removal of the different singletons (the config, the schema the core). Each core is 'named' and a static map (keyed by name) allows to easily manage them. You declare one servlet filter mapping per core you want to expose in the web.xml; this allows easy to access each core through a different
Hudson build is back to normal: Solr-Nightly #147
See http://lucene.zones.apache.org:8080/hudson/job/Solr-Nightly/147/changes
[jira] Resolved: (SOLR-307) NGramFilterFactory and EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-307. --- Resolution: Fixed Thanks Thomas, this is committed. NGramFilterFactory and EdgeNGramFilterFactory - Key: SOLR-307 URL: https://issues.apache.org/jira/browse/SOLR-307 Project: Solr Issue Type: New Feature Components: search Reporter: Thomas Peuss Priority: Minor Attachments: SolrNGramFilters.patch Here is a patch that adds an NGramFilterFactory and EdgeNGramFilterFactory to Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index
[ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513891 ] Pieter Berkel commented on SOLR-308: From the usage case you have provided, it sounds like the unique id will change every time you delete and re-insert the document. If this is the case, then perhaps it might be more efficient to use the lucene document id as your unique id value rather than a seperate field? However, as far as I'm aware, there currently isn't any way to access the lucene doc id from solr (except perhaps the luke request handler)? Add a field that generates an unique id when you have none in your data to index Key: SOLR-308 URL: https://issues.apache.org/jira/browse/SOLR-308 Project: Solr Issue Type: New Feature Components: search Reporter: Thomas Peuss Priority: Minor Attachments: GeneratedId.patch This patch adds a field that generates an unique id when you have no unique id in your data you want to index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-312) create solrj javadoc in build.xml
create solrj javadoc in build.xml - Key: SOLR-312 URL: https://issues.apache.org/jira/browse/SOLR-312 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.3 Environment: a new task in build.xml named javadoc-solrj that does pretty much what you'd expect. creates a new fold build/docs/api-solrj. heavily based on the example from the solr core javadoc target. Reporter: Will Johnson Priority: Minor Fix For: 1.3 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-312) create solrj javadoc in build.xml
[ https://issues.apache.org/jira/browse/SOLR-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-312: -- Attachment: create-solrj-javadoc.patch simple patch to add new task create solrj javadoc in build.xml - Key: SOLR-312 URL: https://issues.apache.org/jira/browse/SOLR-312 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.3 Environment: a new task in build.xml named javadoc-solrj that does pretty much what you'd expect. creates a new fold build/docs/api-solrj. heavily based on the example from the solr core javadoc target. Reporter: Will Johnson Priority: Minor Fix For: 1.3 Attachments: create-solrj-javadoc.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-215) Multiple Solr Cores
[ https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513896 ] Otis Gospodnetic commented on SOLR-215: --- I didn't even realize this patch would still require cores to be declared apriori in static files such as web.xml. I think this new multi-core functionality should come with the core manager handler, as we said here: https://issues.apache.org/jira/browse/SOLR-215#action_12506920 https://issues.apache.org/jira/browse/SOLR-215#action_12507189 So, something like: /admin/coremanager?cmd=addname=fooschema=foo-schema.xmlconfig=foo-solrconfig.xml (this assumes that foo-schema.xml and foo-solrconfig.xml already exist in conf/ dir) One could also POST this and *include* the *content* of the 2 .xml files. In that case the core manager would be the one writing their content to disk in conf/ dir prior to starting the given core. My suggestion is that this be added in phase 2, after Henri's initial changes are committed. Does this sound reasonable? Multiple Solr Cores --- Key: SOLR-215 URL: https://issues.apache.org/jira/browse/SOLR-215 Project: Solr Issue Type: Improvement Reporter: Henri Biestro Priority: Minor Attachments: solr-215.patch, solr-215.patch, solr-215.patch, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, solr-trunk-533775.patch, solr-trunk-538091.patch, solr-trunk-542847-1.patch, solr-trunk-542847.patch, solr-trunk-src.patch WHAT: As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index. This patch is intended to allow multiple cores in Solr which also brings multiple indexes capability. The patch file to grab is solr-215.patch.zip (see MISC session below). WHY: The current Solr practical wisdom is that one schema - thus one index - is most likely to accomodate your indexing needs, using a filter to segregate documents if needed. If you really need multiple indexes, deploy multiple web applications. There are a some use cases however where having multiple indexes or multiple cores through Solr itself may make sense. Multiple cores: Deployment issues within some organizations where IT will resist deploying multiple web applications. Seamless schema update where you can create a new core and switch to it without starting/stopping servers. Embedding Solr in your own application (instead of 'raw' Lucene) and functionally need to segregate schemas collections. Multiple indexes: Multiple language collections where each document exists in different languages, analysis being language dependant. Having document types that have nothing (or very little) in common with respect to their schema, their lifetime/update frequencies or even collection sizes. HOW: The best analogy is to consider that instead of deploying multiple web-application, you can have one web-application that hosts more than one Solr core. The patch does not change any of the core logic (nor the core code); each core is configured behaves exactly as the one core in 1.2; the various caches are per-core so is the info-bean-registry. What the patch does is replace the SolrCore singleton by a collection of cores; all the code modifications are driven by the removal of the different singletons (the config, the schema the core). Each core is 'named' and a static map (keyed by name) allows to easily manage them. You declare one servlet filter mapping per core you want to expose in the web.xml; this allows easy to access each core through a different url. USAGE (example web deployment, patch installed): Step0 java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar solr.xml monitor.ml Will index the 2 documents in solr.xml monitor.xml Step1: http://localhost:8983/solr/core0/admin/stats.jsp Will produce the statistics page from the admin servlet on core0 index; 2 documents Step2: http://localhost:8983/solr/core1/admin/stats.jsp Will produce the statistics page from the admin servlet on core1 index; no documents Step3: java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar ipod*.xml java -Durl='http://localhost:8983/solr/core1/update' -jar post.jar mon*.xml Adds the ipod*.xml to index of core0 and the mon*.xml to the index of core1; running queries from the admin interface, you can verify indexes have different content. USAGE (Java code): //create a configuration SolrConfig config = new SolrConfig(solrconfig.xml); //create a schema IndexSchema schema = new IndexSchema(config, schema0.xml); //create a core from the 2 other. SolrCore core = new SolrCore(core0, /path/to/index, config, schema); //Accessing a core: SolrCore core = SolrCore.getCore(core0); PATCH MODIFICATIONS DETAILS (per package):
[jira] Commented: (SOLR-215) Multiple Solr Cores
[ https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513912 ] Will Johnson commented on SOLR-215: --- did anything ever get baked into the patch for handling the core name as a cgi param instead of as a url path element? the email thread we had going didn't seem to come to any hard conclusions but i'd like to lobby for it as a part of the spec. i read through the patch but i couldn't quite follow things enough to tell. Multiple Solr Cores --- Key: SOLR-215 URL: https://issues.apache.org/jira/browse/SOLR-215 Project: Solr Issue Type: Improvement Reporter: Henri Biestro Priority: Minor Attachments: solr-215.patch, solr-215.patch, solr-215.patch, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, solr-trunk-533775.patch, solr-trunk-538091.patch, solr-trunk-542847-1.patch, solr-trunk-542847.patch, solr-trunk-src.patch WHAT: As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index. This patch is intended to allow multiple cores in Solr which also brings multiple indexes capability. The patch file to grab is solr-215.patch.zip (see MISC session below). WHY: The current Solr practical wisdom is that one schema - thus one index - is most likely to accomodate your indexing needs, using a filter to segregate documents if needed. If you really need multiple indexes, deploy multiple web applications. There are a some use cases however where having multiple indexes or multiple cores through Solr itself may make sense. Multiple cores: Deployment issues within some organizations where IT will resist deploying multiple web applications. Seamless schema update where you can create a new core and switch to it without starting/stopping servers. Embedding Solr in your own application (instead of 'raw' Lucene) and functionally need to segregate schemas collections. Multiple indexes: Multiple language collections where each document exists in different languages, analysis being language dependant. Having document types that have nothing (or very little) in common with respect to their schema, their lifetime/update frequencies or even collection sizes. HOW: The best analogy is to consider that instead of deploying multiple web-application, you can have one web-application that hosts more than one Solr core. The patch does not change any of the core logic (nor the core code); each core is configured behaves exactly as the one core in 1.2; the various caches are per-core so is the info-bean-registry. What the patch does is replace the SolrCore singleton by a collection of cores; all the code modifications are driven by the removal of the different singletons (the config, the schema the core). Each core is 'named' and a static map (keyed by name) allows to easily manage them. You declare one servlet filter mapping per core you want to expose in the web.xml; this allows easy to access each core through a different url. USAGE (example web deployment, patch installed): Step0 java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar solr.xml monitor.ml Will index the 2 documents in solr.xml monitor.xml Step1: http://localhost:8983/solr/core0/admin/stats.jsp Will produce the statistics page from the admin servlet on core0 index; 2 documents Step2: http://localhost:8983/solr/core1/admin/stats.jsp Will produce the statistics page from the admin servlet on core1 index; no documents Step3: java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar ipod*.xml java -Durl='http://localhost:8983/solr/core1/update' -jar post.jar mon*.xml Adds the ipod*.xml to index of core0 and the mon*.xml to the index of core1; running queries from the admin interface, you can verify indexes have different content. USAGE (Java code): //create a configuration SolrConfig config = new SolrConfig(solrconfig.xml); //create a schema IndexSchema schema = new IndexSchema(config, schema0.xml); //create a core from the 2 other. SolrCore core = new SolrCore(core0, /path/to/index, config, schema); //Accessing a core: SolrCore core = SolrCore.getCore(core0); PATCH MODIFICATIONS DETAILS (per package): org.apache.solr.core: The heaviest modifications are in SolrCore SolrConfig. SolrCore is the most obvious modification; instead of a singleton, there is a static map of cores keyed by names and assorted methods. To retain some compatibility, the 'null' named core replaces the singleton for the relevant methods, for instance SolrCore.getCore(). One small constraint on the core name is they can't contain '/' or '\' avoiding potential url file path problems. SolrConfig ( SolrIndexConfig) are now used to persist all
[jira] Commented: (SOLR-215) Multiple Solr Cores
[ https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513946 ] Ryan McKinley commented on SOLR-215: My suggestion is that this be added in phase 2, after Henri's initial changes are committed. Does this sound reasonable? Yes - perhaps getting this checked in without touching handlers or the web-app side is a good idea. It is a little weird since the multi-core aspect would only be usable programatically, but that will make it possible to easily bat around a 'core manager' and http design. The one big question is what to do with the TokenizerFactory API. Yonik, how do you suggest upgrading an interface? The only clean way I can think is to upgrade the TokenizerFactory interface with a 'MulitCoreTokenizerFactory' adding an additional argument. I don't like it, but don't know the API compatibility rules well enough to know if it is required or is ok to change the API. Will - as is, this patch does not let you dynamically change the core. They are statically defined in web.xml. This will change. Multiple Solr Cores --- Key: SOLR-215 URL: https://issues.apache.org/jira/browse/SOLR-215 Project: Solr Issue Type: Improvement Reporter: Henri Biestro Priority: Minor Attachments: solr-215.patch, solr-215.patch, solr-215.patch, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, solr-trunk-533775.patch, solr-trunk-538091.patch, solr-trunk-542847-1.patch, solr-trunk-542847.patch, solr-trunk-src.patch WHAT: As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index. This patch is intended to allow multiple cores in Solr which also brings multiple indexes capability. The patch file to grab is solr-215.patch.zip (see MISC session below). WHY: The current Solr practical wisdom is that one schema - thus one index - is most likely to accomodate your indexing needs, using a filter to segregate documents if needed. If you really need multiple indexes, deploy multiple web applications. There are a some use cases however where having multiple indexes or multiple cores through Solr itself may make sense. Multiple cores: Deployment issues within some organizations where IT will resist deploying multiple web applications. Seamless schema update where you can create a new core and switch to it without starting/stopping servers. Embedding Solr in your own application (instead of 'raw' Lucene) and functionally need to segregate schemas collections. Multiple indexes: Multiple language collections where each document exists in different languages, analysis being language dependant. Having document types that have nothing (or very little) in common with respect to their schema, their lifetime/update frequencies or even collection sizes. HOW: The best analogy is to consider that instead of deploying multiple web-application, you can have one web-application that hosts more than one Solr core. The patch does not change any of the core logic (nor the core code); each core is configured behaves exactly as the one core in 1.2; the various caches are per-core so is the info-bean-registry. What the patch does is replace the SolrCore singleton by a collection of cores; all the code modifications are driven by the removal of the different singletons (the config, the schema the core). Each core is 'named' and a static map (keyed by name) allows to easily manage them. You declare one servlet filter mapping per core you want to expose in the web.xml; this allows easy to access each core through a different url. USAGE (example web deployment, patch installed): Step0 java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar solr.xml monitor.ml Will index the 2 documents in solr.xml monitor.xml Step1: http://localhost:8983/solr/core0/admin/stats.jsp Will produce the statistics page from the admin servlet on core0 index; 2 documents Step2: http://localhost:8983/solr/core1/admin/stats.jsp Will produce the statistics page from the admin servlet on core1 index; no documents Step3: java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar ipod*.xml java -Durl='http://localhost:8983/solr/core1/update' -jar post.jar mon*.xml Adds the ipod*.xml to index of core0 and the mon*.xml to the index of core1; running queries from the admin interface, you can verify indexes have different content. USAGE (Java code): //create a configuration SolrConfig config = new SolrConfig(solrconfig.xml); //create a schema IndexSchema schema = new IndexSchema(config, schema0.xml); //create a core from the 2 other. SolrCore core = new SolrCore(core0, /path/to/index, config, schema); //Accessing a core: SolrCore core
[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index
[ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513959 ] Yonik Seeley commented on SOLR-308: --- Lucene docids are transient (they change when the index changes) - they should not be used across different instances of an IndexReader Add a field that generates an unique id when you have none in your data to index Key: SOLR-308 URL: https://issues.apache.org/jira/browse/SOLR-308 Project: Solr Issue Type: New Feature Components: search Reporter: Thomas Peuss Priority: Minor Attachments: GeneratedId.patch This patch adds a field that generates an unique id when you have no unique id in your data you want to index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index
[ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513960 ] Ryan McKinley commented on SOLR-308: The easiest option is to add a UUID when you index the data. Other options would be to make this FieldType a plugin and put it in the 'lib' directory. Add a field that generates an unique id when you have none in your data to index Key: SOLR-308 URL: https://issues.apache.org/jira/browse/SOLR-308 Project: Solr Issue Type: New Feature Components: search Reporter: Thomas Peuss Priority: Minor Attachments: GeneratedId.patch This patch adds a field that generates an unique id when you have no unique id in your data you want to index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index
[ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513972 ] Hoss Man commented on SOLR-308: --- I understood your data entry/delete reindexing strategy, but i hadn't considered the use case of doing a query, and then issuing a followup query to get more details about specific items. As yonik points out, exposing the internal lucene docid would be a bad idea since it may change every time an IndexReader is opened ... even if hte doc you are interested in is still in the index (ie: hasn't been deleted) other deletions may have changed it's internal id. i have no objection to adding a FieldType that can generate UUID on demand for use cases like this, but having it ignore the input seems a little sketchy to me. it seems like a better approach would be to have UUIDFieldType with a toInternal() method that tests it's input for some marker token (like NEW or *) and if it sees that token, generates a new UUID, otherwise it uses the literal value. then you can configure the id field with a defaultValue of NEW in the schema and any doc without an id will get a unique one, but if someone tries to update an existing doc whose id they already know, it will still work as well. Add a field that generates an unique id when you have none in your data to index Key: SOLR-308 URL: https://issues.apache.org/jira/browse/SOLR-308 Project: Solr Issue Type: New Feature Components: search Reporter: Thomas Peuss Priority: Minor Attachments: GeneratedId.patch This patch adds a field that generates an unique id when you have no unique id in your data you want to index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-311) wrong quoting in tutorial - fails on windows
[ https://issues.apache.org/jira/browse/SOLR-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-311. --- Resolution: Fixed Assignee: Hoss Man thanks for pointing this out. FYI: the html and pdf versions are generated from the xml by forrest. Committed revision 557739. website sync should be ~30 minutes wrong quoting in tutorial - fails on windows Key: SOLR-311 URL: https://issues.apache.org/jira/browse/SOLR-311 Project: Solr Issue Type: Bug Components: documentation Environment: Windows XP and likely other windows variants Reporter: Paul Sundling Assignee: Hoss Man Priority: Trivial java -Ddata=args -Dcommit=no -jar post.jar 'deleteidSP2514N/id/delete' and java -Ddata=args -jar post.jar 'deletequeryname:DDR/query/delete' should have their single quotes replaced with double quotes. Otherwise, it results in the following error on windows command line: (sample DOS window FAILS) C:\downloads\temp\apache-solr-1.2.0\example\exampledocsjava -Ddata=args -jar post.jar 'deletequeryname:DDR/query/delete' was unexpected at this time. (sample DOS window WORKS) C:\downloads\temp\apache-solr-1.2.0\example\exampledocsjava -Ddata=args -jar post.jar deletequeryname:DDR/query/delete SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing args to http://localhost:8983/solr/update.. SimplePostTool: COMMITting Solr index changes.. As demonstrated double quotes works with windows. I also tested double quotes in cygwin, and it should presumably work for linux/UNIX as well. I started to do a patch, but I see there are three locations where updates might need to be made and I wasn't sure how PDF files were generated, so here's the list of effected source files: site/tutorial.html site/tutorial.pdf src/site/src/documentation/content/xdocs/tutorial.xml -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-312) create solrj javadoc in build.xml
[ https://issues.apache.org/jira/browse/SOLR-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley resolved SOLR-312. Resolution: Fixed added in rev 557774 thanks Will create solrj javadoc in build.xml - Key: SOLR-312 URL: https://issues.apache.org/jira/browse/SOLR-312 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.3 Environment: a new task in build.xml named javadoc-solrj that does pretty much what you'd expect. creates a new fold build/docs/api-solrj. heavily based on the example from the solr core javadoc target. Reporter: Will Johnson Priority: Minor Fix For: 1.3 Attachments: create-solrj-javadoc.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-312) create solrj javadoc in build.xml
[ https://issues.apache.org/jira/browse/SOLR-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514003 ] Paul Sundling commented on SOLR-312: Maven already has javadoc target included automatically. create solrj javadoc in build.xml - Key: SOLR-312 URL: https://issues.apache.org/jira/browse/SOLR-312 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.3 Environment: a new task in build.xml named javadoc-solrj that does pretty much what you'd expect. creates a new fold build/docs/api-solrj. heavily based on the example from the solr core javadoc target. Reporter: Will Johnson Priority: Minor Fix For: 1.3 Attachments: create-solrj-javadoc.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
dismax catenated token search
Does anyone have a good idea how to go about searching for concatenated tokens? Say that the index has painkiller and the user types in pain killer (without the quotes). If one were using the standard request handler, the easiest would be to have the client handle it by sending in both variants: pain OR killer OR painkiller or a variant like pain killer OR painkiller But is there any answer when using dismax? Requiring the client to send in pain killer painkiller seems like it may decrease relevance too much if you currently use pf (phrase fields) since the phrase pain killer painkiller isn't going to match anything. Thoughts? -Yonik
[jira] Updated: (SOLR-258) Date based Facets
[ https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-258: -- Attachment: date_facets.patch checkpoint... * renamed pre/post/inner to before/after/between * added a new facet.date.hardend param (with test additions) ...still need to tackle the NOW inconsistency issue. Date based Facets - Key: SOLR-258 URL: https://issues.apache.org/jira/browse/SOLR-258 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Attachments: date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch 1) Allow clients to express concepts like... * give me facet counts per day for every day this month. * give me facet counts per hour for every hour of today. * give me facet counts per hour for every hour of a specific day. * give me facet counts per hour for every hour of a specific day and give me facet counts for the number of matches before that day, or after that day. 2) Return all data in a way that makes it easy to use to build filter queries on those date ranges. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: dismax catenated token search
On 19-Jul-07, at 2:49 PM, Yonik Seeley wrote: Does anyone have a good idea how to go about searching for concatenated tokens? Say that the index has painkiller and the user types in pain killer (without the quotes). If one were using the standard request handler, the easiest would be to have the client handle it by sending in both variants: pain OR killer OR painkiller or a variant like pain killer OR painkiller But is there any answer when using dismax? Requiring the client to send in pain killer painkiller seems like it may decrease relevance too much if you currently use pf (phrase fields) since the phrase pain killer painkiller isn't going to match anything. Thoughts? Yes, pf should be replaced by a word proximity query that doesn't require all words to match :) -Mike
Re: dismax catenated token search
On 7/19/07, Mike Klaas [EMAIL PROTECTED] wrote: On 19-Jul-07, at 2:49 PM, Yonik Seeley wrote: Does anyone have a good idea how to go about searching for concatenated tokens? Say that the index has painkiller and the user types in pain killer (without the quotes). If one were using the standard request handler, the easiest would be to have the client handle it by sending in both variants: pain OR killer OR painkiller or a variant like pain killer OR painkiller But is there any answer when using dismax? Requiring the client to send in pain killer painkiller seems like it may decrease relevance too much if you currently use pf (phrase fields) since the phrase pain killer painkiller isn't going to match anything. Thoughts? Yes, pf should be replaced by a word proximity query that doesn't require all words to match :) Some other quick ideas: 1) client issues two separate queries... pain killer and painkiller and merges results. 2) dismax parameter that throws word catenations into the MaxDisjunction: a b c would also search for ab and bc. -Yonik
[jira] Updated: (SOLR-258) Date based Facets
[ https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-258: -- Attachment: date_facets.patch fixed the the NOW issue by refactoring the toExternal(toInternal()) logic into a new DateField.parseMath(Date,String) method ... a DateMathParser is still used internally to deal with teh math parsing aspects, but i wanted to leave the assumptions about the date format in the DateField class itself. comments/critique about this approach welcome. Date based Facets - Key: SOLR-258 URL: https://issues.apache.org/jira/browse/SOLR-258 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Attachments: date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch 1) Allow clients to express concepts like... * give me facet counts per day for every day this month. * give me facet counts per hour for every hour of today. * give me facet counts per hour for every hour of a specific day. * give me facet counts per hour for every hour of a specific day and give me facet counts for the number of matches before that day, or after that day. 2) Return all data in a way that makes it easy to use to build filter queries on those date ranges. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: dismax catenated token search
: Yes, pf should be replaced by a word proximity query that doesn't : require all words to match :) : 2) dismax parameter that throws word catenations into the MaxDisjunction: :a b c would also search for ab and bc. that doesn't address the inverse problem: when pain killer is indexed but the user searches for painkiller I believe both problems can be solved by using the NgramTokenizer on a field in the qf ... but i have not tested this. (i'm not entreily certain what the NgramTokenizer does with whitespaces, so it might actually need to KeywordTokenizer followed by a Filter that strips out interword whitespace, followed by NgramTokenFilter ... or something like that. -Hoss
[jira] Commented: (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514099 ] Mike Klaas commented on SOLR-139: - It is my fault that the DUH2 locking is so hairy to begin with, so I should at least review changes to it ;) With your last change, the locking looks sound. However, I noticed a few things: This comment is now inaccurate: +// need to start off with the write lock because we can't aquire +// the write lock if we need to. Should openSearcher() call closeSearcher() instead of doing it manually? It looks like searcherHasChanges is not being reset to false. Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Assignee: Ryan McKinley Attachments: getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-305) ad support to analysis tool for working with type names instead of just field names
[ https://issues.apache.org/jira/browse/SOLR-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-305. --- Resolution: Fixed Assignee: Hoss Man Committed revision 557870. ad support to analysis tool for working with type names instead of just field names --- Key: SOLR-305 URL: https://issues.apache.org/jira/browse/SOLR-305 Project: Solr Issue Type: Improvement Reporter: Hoss Man Assignee: Hoss Man Priority: Trivial Attachments: analysistool.bytype.diff quick little patch to analysis.jsp so people can choose between specifying a field name or a fieldtype name ... may save time when you want to try out a bunch of different analyzer options because you only have to create the field types - not fields that use them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-102) Ideas for better highlighting
[ https://issues.apache.org/jira/browse/SOLR-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas resolved SOLR-102. - Resolution: Fixed committed in r557872 Ideas for better highlighting - Key: SOLR-102 URL: https://issues.apache.org/jira/browse/SOLR-102 Project: Solr Issue Type: Improvement Components: search Reporter: Mike Klaas Assignee: Mike Klaas Priority: Minor Attachments: regexfrag.patch, RegexFragmenter.java A collection of rough enhancements to the default highlighter. Mostly to be used as ideas for future development. RegexFragmenter - Define a regular expression to indicate points of interest inthe target text (eg., beginning/end of sentences). Fragmenter will attempt to start/end fragments at these locations -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.