adding XML data to SOLR index using DIH (xml-data-config)
We regularly create a SOLR index from XML files, using the DIH with a suitably edited xml-data-config.xml. However, whenever new XML become available it seems like we have to rebuild the entire index again using the Data Import Handler. Are we missing something? Should it be possible to add new XML to the index using /dataimport with delta-import selected? Many thanks if anyone has been able to add new XML files to the SOLR index without reindexing everything again. Paul
Re: mapreduce job using soirj 5
Check mapreduce.task.classpath.user.precedence and its equivalent property in different hadoop version. HADOOP_OPTS needs to work with this property being set to true. I met problem like yours. And playing with these parameters solved my problem. On Wed, Jun 17, 2015 at 12:28 AM, adfel70 adfe...@gmail.com wrote: We cannot downgrade httpclient in solrj5 because its using new features and we dont want to start altering solr code, anyway we thought about upgrading httpclient in hadoop but as Erick said its sounds more work than just put the jar in the data nodes. About that flag we tried it, hadoop even has an environment variable HADOOP_USER_CLASSPATH_FIRST but all our tests with that flag failed. We thought this is an issue that is more likely that solr users will encounter rather than cloudera users, so we will be glad for a more elegant solution or workaround than to replace the httpclient jar in the data nodes Thank you all for your responses -- View this message in context: http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199p4212350.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shenghua (Daniel) Wan
Matching Queries with Wildcards and Numbers
Hi! I am a Solr user having an issue with matches on searches using the wildcard operators, specifically when the searches include a wildcard operator with a number. Here is an example. My query will look like (productTitle:*Sidem2*) and match nothing, when it should be matching the productTitle Sidem2. However, searching for Sidem will match the productTitle Sidem2. In addition, I have isolated it to only fail to match when the productTitle has a number in it, for example a query for (productTitle:*Cupx Collapsed*) will correctly match the product Cupx Collapsed. I need to use the wildcard operators around the query so that an auto-complete feature can be used, where if a user stops typing at a certain point, a search will be executed on their input so far and it will match the correct product titles. I have looked all over, through the excellent book Solr In Action by Grainger and Potter, through Stack Overflow and several blog posts and have not found anything on this specific issue. Common advice is to remove the stemmer, which I have done. I have also added the ReversedWildcardFilterFactory. Here is a copy of my schema for the specific fieldType if that is any help. Please let me know if anyone has any tips or clues! I am not a very experienced Solr user and would really appreciate any advice. fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. -- filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt / !-- Concatenate characters and numbers by setting catenateAll to 1 - this will avoid problems with alphabetical sort -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.ReversedWildcardFilterFactory withOriginal=true maxPosAsterisk=2 maxPosQuestion=1 minTrailing=2 maxFractionAsterisk=0/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt / !-- Concatenate characters and numbers by setting catenateAll to 1 - this will avoid problems with alphabetical sort -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ /analyzer /fieldType Thank you in advance! --From a sincerely puzzled Solr user, Ellington Kirby
Re: Please help test the new Angular JS Admin UI
Also, while you are at it, it'd be good to get SOLR-4777 in so the Admin UI is correct when users look at the SolrCloud graph post an operation that can leave the slice INACTIVE e.g. Shard split. On Wed, Jun 17, 2015 at 2:50 PM, Anshum Gupta ans...@anshumgupta.net wrote: This looks good overall and thanks for migrating it to something that more developers can contribute to. I started solr (trunk) in cloud mode using the bin scripts and opened the new admin UI. The section for 'cores' says 'No cores available. Go and create one'. Starting Solr 5.0, we officially stated in the change log and at other places that the only supported way to create a collection is through the Collections API. We should move along those lines and not stray with the new interface. I am not sure if the intention with this move is to first migrate everything as is and then redo the design but I'd strongly suggest that we do things the right way. On Sun, Jun 14, 2015 at 5:53 PM, Erick Erickson erickerick...@gmail.com wrote: And anyone who, you know, really likes working with UI code please help making it better! As of Solr 5.2, there is a new version of the Admin UI available, and several improvements are already in 5.2.1 (release imminent). The old admin UI is still the default, the new one is available at solr_ip:port/admin/index.html Currently, you will see very little difference at first glance; the goal for this release was to have as much of the current functionality as possible ported to establish the framework. Upayavira has done almost all of the work getting this in place, thanks for taking that initiative Upayavira! Anyway, the plan is several fold: Get as much testing on this as possible over the 5.2 time frame. Make the new Angular JS-based code the default in 5.3 Make improvements/bug fixes to the admin UI on the new code line, particularly SolrCloud functionality. Deprecate the current code and remove it eventually. The new code should be quite a bit easier to work on for programmer types, and there are Big Plans Afoot for making the admin UI more SolrCloud-friendly. Now that the framework is in place, it should be easier for anyone who wants to volunteer to contribute, please do! So please give it a whirl. I'm sure there will be things that crop up, and any help addressing them will be appreciated. There's already an umbrella JIRA for this work, see: https://issues.apache.org/jira/browse/SOLR-7666. Please link any new issues to this JIRA so we can keep track of it all as well as coordinate efforts. If all goes well, this JIRA can be used to see what's already been reported too. Note that things may be moving pretty quickly, so trunk and 5x will always be the most current. That said looking at 5.2.1 will be much appreciated. Erick -- Anshum Gupta -- Anshum Gupta
Re: Where is schema.xml ?
Do you have a managed-schema file, or such? You may have used the configs that have a managed schema, i.e. one that allows you to change the schema via HTTP. Upayavira On Wed, Jun 17, 2015, at 02:33 PM, TK Solr wrote: With Solr 5.2.0, I ran: bin/solr create -c foo This created solrconfig.xml in server/solr/foo/conf directory. Other configuration files such as synonyms.txt are found in this directory too. But I don't see schema.xml. Why is schema.xml handled differently? I am guessing server/solr/configsets/sample_techproducts_configs/conf/schema.xml is used by the foo core because it knows about the cat field. Is the template files in sample_techproducts_configs considered standard? TK
Re: Please help test the new Angular JS Admin UI
This looks good overall and thanks for migrating it to something that more developers can contribute to. I started solr (trunk) in cloud mode using the bin scripts and opened the new admin UI. The section for 'cores' says 'No cores available. Go and create one'. Starting Solr 5.0, we officially stated in the change log and at other places that the only supported way to create a collection is through the Collections API. We should move along those lines and not stray with the new interface. I am not sure if the intention with this move is to first migrate everything as is and then redo the design but I'd strongly suggest that we do things the right way. On Sun, Jun 14, 2015 at 5:53 PM, Erick Erickson erickerick...@gmail.com wrote: And anyone who, you know, really likes working with UI code please help making it better! As of Solr 5.2, there is a new version of the Admin UI available, and several improvements are already in 5.2.1 (release imminent). The old admin UI is still the default, the new one is available at solr_ip:port/admin/index.html Currently, you will see very little difference at first glance; the goal for this release was to have as much of the current functionality as possible ported to establish the framework. Upayavira has done almost all of the work getting this in place, thanks for taking that initiative Upayavira! Anyway, the plan is several fold: Get as much testing on this as possible over the 5.2 time frame. Make the new Angular JS-based code the default in 5.3 Make improvements/bug fixes to the admin UI on the new code line, particularly SolrCloud functionality. Deprecate the current code and remove it eventually. The new code should be quite a bit easier to work on for programmer types, and there are Big Plans Afoot for making the admin UI more SolrCloud-friendly. Now that the framework is in place, it should be easier for anyone who wants to volunteer to contribute, please do! So please give it a whirl. I'm sure there will be things that crop up, and any help addressing them will be appreciated. There's already an umbrella JIRA for this work, see: https://issues.apache.org/jira/browse/SOLR-7666. Please link any new issues to this JIRA so we can keep track of it all as well as coordinate efforts. If all goes well, this JIRA can be used to see what's already been reported too. Note that things may be moving pretty quickly, so trunk and 5x will always be the most current. That said looking at 5.2.1 will be much appreciated. Erick -- Anshum Gupta
Re: Please help test the new Angular JS Admin UI
Thanks Ramkumar, will dig into these next week. Upayavira On Wed, Jun 17, 2015, at 02:08 PM, Ramkumar R. Aiyengar wrote: I started with an empty Solr instance and Firefox 38 on Linux. This is the trunk source.. There's a 'No cores available. Go and create one' button available in the old and the new UI. In the old UI, clicking it goes to the core admin, and pops open the dialog for Add Core. The new UI only goes to the core admin. Also, when you then click on the Add Core, the dialog bleeds into the sidebar. I then started with a getting started config and a cloud of 2x2. Then brought up admin UI on one of them, opened up one of the cores, and clicked on the Files tab -- that showed an exception.. {data:{responseHeader:{status:500,QTime:1},error:{msg:Path must not end with / character,trace:java.lang.IllegalArgumentException: Path must not end with / character\n\tat org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:58)\n\tat org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1024)\n\tat org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:319)\n\tat org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:316)\n\tat org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)\n\tat org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:316)\n\tat org.apache.solr.handler.admin.ShowFileRequestHandler.getAdminFileFromZooKeeper(ShowFileRequestHandler.java:324)\n\tat org.apache.solr.handler.admin.ShowFileRequestHandler.showFromZooKeeper(ShowFileRequestHandler.java:148)\n\tat org.apache.solr.handler.admin.ShowFileRequestHandler.handleRequestBody(ShowFileRequestHandler.java:135)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2057)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:648)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:452)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat Moving to Plugins/Stats, and then Core, and selecting the first searcher entry (e.g. for me, it is Searcher@3a7bd1[gettingstarted_shard1_replica1] main), I see stats like: - searcherName:Searcher@#8203;3a7bd1[gettingstarted_shard1_replica1] main - reader: ExitableDirectoryReader(#8203;UninvertingDirectoryReader(#8203;)) Notice the unescaped characters there..
Re: Where is schema.xml ?
On 6/17/15, 2:35 PM, Upayavira wrote: Do you have a managed-schema file, or such? You may have used the configs that have a managed schema, i.e. one that allows you to change the schema via HTTP. I do see a file named managed-schema without .xml extension in the conf directory. Its content does look like a schema.xml file. Is this an initial content of in-memory schema, and schema API updates the schema dynamically?
Re: Please help test the new Angular JS Admin UI
I started with an empty Solr instance and Firefox 38 on Linux. This is the trunk source.. There's a 'No cores available. Go and create one' button available in the old and the new UI. In the old UI, clicking it goes to the core admin, and pops open the dialog for Add Core. The new UI only goes to the core admin. Also, when you then click on the Add Core, the dialog bleeds into the sidebar. I then started with a getting started config and a cloud of 2x2. Then brought up admin UI on one of them, opened up one of the cores, and clicked on the Files tab -- that showed an exception.. {data:{responseHeader:{status:500,QTime:1},error:{msg:Path must not end with / character,trace:java.lang.IllegalArgumentException: Path must not end with / character\n\tat org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:58)\n\tat org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1024)\n\tat org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:319)\n\tat org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:316)\n\tat org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)\n\tat org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:316)\n\tat org.apache.solr.handler.admin.ShowFileRequestHandler.getAdminFileFromZooKeeper(ShowFileRequestHandler.java:324)\n\tat org.apache.solr.handler.admin.ShowFileRequestHandler.showFromZooKeeper(ShowFileRequestHandler.java:148)\n\tat org.apache.solr.handler.admin.ShowFileRequestHandler.handleRequestBody(ShowFileRequestHandler.java:135)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2057)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:648)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:452)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat Moving to Plugins/Stats, and then Core, and selecting the first searcher entry (e.g. for me, it is Searcher@3a7bd1[gettingstarted_shard1_replica1] main), I see stats like: - searcherName:Searcher@#8203;3a7bd1[gettingstarted_shard1_replica1] main - reader: ExitableDirectoryReader(#8203;UninvertingDirectoryReader(#8203;)) Notice the unescaped characters there..
Where is schema.xml ?
With Solr 5.2.0, I ran: bin/solr create -c foo This created solrconfig.xml in server/solr/foo/conf directory. Other configuration files such as synonyms.txt are found in this directory too. But I don't see schema.xml. Why is schema.xml handled differently? I am guessing server/solr/configsets/sample_techproducts_configs/conf/schema.xml is used by the foo core because it knows about the cat field. Is the template files in sample_techproducts_configs considered standard? TK
RE: Please help test the new Angular JS Admin UI
i will check with Henry about this prolem again. Best, Soonho From: Ramkumar R. Aiyengar [andyetitmo...@gmail.com] Sent: Wednesday, June 17, 2015 5:08 PM To: solr-user@lucene.apache.org Subject: Re: Please help test the new Angular JS Admin UI I started with an empty Solr instance and Firefox 38 on Linux. This is the trunk source.. There's a 'No cores available. Go and create one' button available in the old and the new UI. In the old UI, clicking it goes to the core admin, and pops open the dialog for Add Core. The new UI only goes to the core admin. Also, when you then click on the Add Core, the dialog bleeds into the sidebar. I then started with a getting started config and a cloud of 2x2. Then brought up admin UI on one of them, opened up one of the cores, and clicked on the Files tab -- that showed an exception.. {data:{responseHeader:{status:500,QTime:1},error:{msg:Path must not end with / character,trace:java.lang.IllegalArgumentException: Path must not end with / character\n\tat org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:58)\n\tat org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1024)\n\tat org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:319)\n\tat org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:316)\n\tat org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)\n\tat org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:316)\n\tat org.apache.solr.handler.admin.ShowFileRequestHandler.getAdminFileFromZooKeeper(ShowFileRequestHandler.java:324)\n\tat org.apache.solr.handler.admin.ShowFileRequestHandler.showFromZooKeeper(ShowFileRequestHandler.java:148)\n\tat org.apache.solr.handler.admin.ShowFileRequestHandler.handleRequestBody(ShowFileRequestHandler.java:135)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2057)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:648)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:452)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat Moving to Plugins/Stats, and then Core, and selecting the first searcher entry (e.g. for me, it is Searcher@3a7bd1[gettingstarted_shard1_replica1] main), I see stats like: - searcherName:Searcher@#8203;3a7bd1[gettingstarted_shard1_replica1] main - reader: ExitableDirectoryReader(#8203;UninvertingDirectoryReader(#8203;)) Notice the unescaped characters there..
About indexing embed file with solr
Hello, Could anyone recieve my email? I'm new to solr and I have some questions, could anyone help me to give me some answer?? I index file directly by extracting the content of file using Tika embeded in solr. There is no problem of normal files. While I index a word embeded an another file, such as a pdf file embed in a word, I couldn't get the content of embeded file. For example, I have a word(doc) and there is a pdf embeded in the word(doc), I couldn't index the content of the pdf file. While using the same jar of Tika to extract the content of embed file, I can get the content of embeded file. I know Tika could extract the embed file since version 1.3. And the version of my solr is 4.9.1, Tika used in this version of solr is 1.5. I don't know why I can't get the content of embed file. Could anyone help me? Thank you very much. Ping Liu 18 June. 2015
Re: Matching Queries with Wildcards and Numbers
This one's going to be confusing to explain. The ability of filters to operate on wildcarded terms at query time is limited to some specific filters. If you're going into the code, see MultiTermAware-derived filters. Most generally, the MultiTermAware filters only are valid for filters that do _not_ produce more than one output token for a given input token. Gibberish, I know, but bear with me. WordDelimiterFilterFactory is _NOT_ MultiTermAware because, you guessed it, it can produce more than one token per input token at query time. Specifically in your example, at index time it'll produce tokens Sidem and 2. However, at query time for Sidem2 it will just pass the token through complete. And since the token is not in your index, it's not found. Hmm, I wonder what the admin/analysis page would show here Anyway, you probably can get what you want by changing the index time definition of WDFF from catenateAll=0 to catenateAll=1. That will put Sidem, 2, and Sidem2 in your index. Then the fact that query time processing for wildcards does _not_ break things up, Sidem2 will go through at query time. Then the doc should be found. Of course you have to reindex your docs after the change. Trying to allow wildcards for filters at query time that emit multiple output tokens per input token is an utter and complete disaster. HTH, Erick On Wed, Jun 17, 2015 at 10:56 AM, Ellington Kirby ellingtonkirb...@gmail.com wrote: Hi! I am a Solr user having an issue with matches on searches using the wildcard operators, specifically when the searches include a wildcard operator with a number. Here is an example. My query will look like (productTitle:*Sidem2*) and match nothing, when it should be matching the productTitle Sidem2. However, searching for Sidem will match the productTitle Sidem2. In addition, I have isolated it to only fail to match when the productTitle has a number in it, for example a query for (productTitle:*Cupx Collapsed*) will correctly match the product Cupx Collapsed. I need to use the wildcard operators around the query so that an auto-complete feature can be used, where if a user stops typing at a certain point, a search will be executed on their input so far and it will match the correct product titles. I have looked all over, through the excellent book Solr In Action by Grainger and Potter, through Stack Overflow and several blog posts and have not found anything on this specific issue. Common advice is to remove the stemmer, which I have done. I have also added the ReversedWildcardFilterFactory. Here is a copy of my schema for the specific fieldType if that is any help. Please let me know if anyone has any tips or clues! I am not a very experienced Solr user and would really appreciate any advice. fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. -- filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt / !-- Concatenate characters and numbers by setting catenateAll to 1 - this will avoid problems with alphabetical sort -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.ReversedWildcardFilterFactory withOriginal=true maxPosAsterisk=2 maxPosQuestion=1 minTrailing=2 maxFractionAsterisk=0/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt / !-- Concatenate characters and numbers by setting catenateAll to 1 - this will avoid problems with alphabetical sort -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ /analyzer /fieldType Thank you in advance! --From a sincerely puzzled Solr user, Ellington Kirby
API support upload file for External File Field
Is there any API to support upload file for ExternalFileField to /data/ directory or any good practice on this? My application and Solr Server were physically separated on two place. Application will calculate a score and generate a file for ExternalFileField. Thanks for any input.
Re: Please help test the new Angular JS Admin UI
The intention very much is to do a collections API pane. In fact, I've got a first pass made already that can create/delete collections, and show the details of a collection and its replicas. But I want to focus on getting the feature-for-feature replacement working first. If we don't do that, then we can't make it default, then we create a divided experience for want a working UI and want the cool new features. A decent collections API tab really won't take that long I don't think once we've given the new version a good shake-down. Upayavira On Wed, Jun 17, 2015, at 02:50 PM, Anshum Gupta wrote: This looks good overall and thanks for migrating it to something that more developers can contribute to. I started solr (trunk) in cloud mode using the bin scripts and opened the new admin UI. The section for 'cores' says 'No cores available. Go and create one'. Starting Solr 5.0, we officially stated in the change log and at other places that the only supported way to create a collection is through the Collections API. We should move along those lines and not stray with the new interface. I am not sure if the intention with this move is to first migrate everything as is and then redo the design but I'd strongly suggest that we do things the right way. On Sun, Jun 14, 2015 at 5:53 PM, Erick Erickson erickerick...@gmail.com wrote: And anyone who, you know, really likes working with UI code please help making it better! As of Solr 5.2, there is a new version of the Admin UI available, and several improvements are already in 5.2.1 (release imminent). The old admin UI is still the default, the new one is available at solr_ip:port/admin/index.html Currently, you will see very little difference at first glance; the goal for this release was to have as much of the current functionality as possible ported to establish the framework. Upayavira has done almost all of the work getting this in place, thanks for taking that initiative Upayavira! Anyway, the plan is several fold: Get as much testing on this as possible over the 5.2 time frame. Make the new Angular JS-based code the default in 5.3 Make improvements/bug fixes to the admin UI on the new code line, particularly SolrCloud functionality. Deprecate the current code and remove it eventually. The new code should be quite a bit easier to work on for programmer types, and there are Big Plans Afoot for making the admin UI more SolrCloud-friendly. Now that the framework is in place, it should be easier for anyone who wants to volunteer to contribute, please do! So please give it a whirl. I'm sure there will be things that crop up, and any help addressing them will be appreciated. There's already an umbrella JIRA for this work, see: https://issues.apache.org/jira/browse/SOLR-7666. Please link any new issues to this JIRA so we can keep track of it all as well as coordinate efforts. If all goes well, this JIRA can be used to see what's already been reported too. Note that things may be moving pretty quickly, so trunk and 5x will always be the most current. That said looking at 5.2.1 will be much appreciated. Erick -- Anshum Gupta
Re: Where is schema.xml ?
On Wed, Jun 17, 2015, at 02:49 PM, TK Solr wrote: On 6/17/15, 2:35 PM, Upayavira wrote: Do you have a managed-schema file, or such? You may have used the configs that have a managed schema, i.e. one that allows you to change the schema via HTTP. I do see a file named managed-schema without .xml extension in the conf directory. Its content does look like a schema.xml file. Is this an initial content of in-memory schema, and schema API updates the schema dynamically? Yup, that's how I understand it. You should not edit that file directly. Upayavira
Re: Please help test the new Angular JS Admin UI
This kind of feedback is _very_ valuable, many thanks to all. I may be the one committing this, but Upayavira is doing all the work so hats off to him. And it's time for anyone who likes UI work to step up and contribute ;). I'll be happy to commit changes. Just link any JIRAs (especially ones with patches attached) to SOLR-7666 and I'll see them. Or mention me in the new JIRA and I'll link them. Needless to say, UI work isn't something I'm very good at On Wed, Jun 17, 2015 at 5:55 PM, Upayavira u...@odoko.co.uk wrote: We can get things like this in. If you want, feel free to have a go. As much as I want to work on funky new stuff, I really need to focus on finishing stuff first. Upayavira On Wed, Jun 17, 2015, at 02:53 PM, Anshum Gupta wrote: Also, while you are at it, it'd be good to get SOLR-4777 in so the Admin UI is correct when users look at the SolrCloud graph post an operation that can leave the slice INACTIVE e.g. Shard split. On Wed, Jun 17, 2015 at 2:50 PM, Anshum Gupta ans...@anshumgupta.net wrote: This looks good overall and thanks for migrating it to something that more developers can contribute to. I started solr (trunk) in cloud mode using the bin scripts and opened the new admin UI. The section for 'cores' says 'No cores available. Go and create one'. Starting Solr 5.0, we officially stated in the change log and at other places that the only supported way to create a collection is through the Collections API. We should move along those lines and not stray with the new interface. I am not sure if the intention with this move is to first migrate everything as is and then redo the design but I'd strongly suggest that we do things the right way. On Sun, Jun 14, 2015 at 5:53 PM, Erick Erickson erickerick...@gmail.com wrote: And anyone who, you know, really likes working with UI code please help making it better! As of Solr 5.2, there is a new version of the Admin UI available, and several improvements are already in 5.2.1 (release imminent). The old admin UI is still the default, the new one is available at solr_ip:port/admin/index.html Currently, you will see very little difference at first glance; the goal for this release was to have as much of the current functionality as possible ported to establish the framework. Upayavira has done almost all of the work getting this in place, thanks for taking that initiative Upayavira! Anyway, the plan is several fold: Get as much testing on this as possible over the 5.2 time frame. Make the new Angular JS-based code the default in 5.3 Make improvements/bug fixes to the admin UI on the new code line, particularly SolrCloud functionality. Deprecate the current code and remove it eventually. The new code should be quite a bit easier to work on for programmer types, and there are Big Plans Afoot for making the admin UI more SolrCloud-friendly. Now that the framework is in place, it should be easier for anyone who wants to volunteer to contribute, please do! So please give it a whirl. I'm sure there will be things that crop up, and any help addressing them will be appreciated. There's already an umbrella JIRA for this work, see: https://issues.apache.org/jira/browse/SOLR-7666. Please link any new issues to this JIRA so we can keep track of it all as well as coordinate efforts. If all goes well, this JIRA can be used to see what's already been reported too. Note that things may be moving pretty quickly, so trunk and 5x will always be the most current. That said looking at 5.2.1 will be much appreciated. Erick -- Anshum Gupta -- Anshum Gupta
Re: Please help test the new Angular JS Admin UI
We can get things like this in. If you want, feel free to have a go. As much as I want to work on funky new stuff, I really need to focus on finishing stuff first. Upayavira On Wed, Jun 17, 2015, at 02:53 PM, Anshum Gupta wrote: Also, while you are at it, it'd be good to get SOLR-4777 in so the Admin UI is correct when users look at the SolrCloud graph post an operation that can leave the slice INACTIVE e.g. Shard split. On Wed, Jun 17, 2015 at 2:50 PM, Anshum Gupta ans...@anshumgupta.net wrote: This looks good overall and thanks for migrating it to something that more developers can contribute to. I started solr (trunk) in cloud mode using the bin scripts and opened the new admin UI. The section for 'cores' says 'No cores available. Go and create one'. Starting Solr 5.0, we officially stated in the change log and at other places that the only supported way to create a collection is through the Collections API. We should move along those lines and not stray with the new interface. I am not sure if the intention with this move is to first migrate everything as is and then redo the design but I'd strongly suggest that we do things the right way. On Sun, Jun 14, 2015 at 5:53 PM, Erick Erickson erickerick...@gmail.com wrote: And anyone who, you know, really likes working with UI code please help making it better! As of Solr 5.2, there is a new version of the Admin UI available, and several improvements are already in 5.2.1 (release imminent). The old admin UI is still the default, the new one is available at solr_ip:port/admin/index.html Currently, you will see very little difference at first glance; the goal for this release was to have as much of the current functionality as possible ported to establish the framework. Upayavira has done almost all of the work getting this in place, thanks for taking that initiative Upayavira! Anyway, the plan is several fold: Get as much testing on this as possible over the 5.2 time frame. Make the new Angular JS-based code the default in 5.3 Make improvements/bug fixes to the admin UI on the new code line, particularly SolrCloud functionality. Deprecate the current code and remove it eventually. The new code should be quite a bit easier to work on for programmer types, and there are Big Plans Afoot for making the admin UI more SolrCloud-friendly. Now that the framework is in place, it should be easier for anyone who wants to volunteer to contribute, please do! So please give it a whirl. I'm sure there will be things that crop up, and any help addressing them will be appreciated. There's already an umbrella JIRA for this work, see: https://issues.apache.org/jira/browse/SOLR-7666. Please link any new issues to this JIRA so we can keep track of it all as well as coordinate efforts. If all goes well, this JIRA can be used to see what's already been reported too. Note that things may be moving pretty quickly, so trunk and 5x will always be the most current. That said looking at 5.2.1 will be much appreciated. Erick -- Anshum Gupta -- Anshum Gupta
XPathentity processor on CLOB field
My requirement is to read the XML from a CLOB field and parse it to get the entity. The data config is as shown below. I am trying to map two fields 'event' and 'policyNumber' for the entity 'catreport'. dataSource name=mbdev driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@localhost:1521:orcl user=xyz password=xyz/ document name=insight entity name=input query=select * from test logLevel=debug datasource=mbdev transformer=ClobTransformer, script:toDate field column=LOAD_DATE name=load_date / field column=RESPONSE_XML name=RESPONSE_XML clob=true / dataSource name=xmldata type=FieldReaderDataSource/ entity name=catReport dataSource=xmldata dataField=input.RESPONSE_XML processor=XPathEntityProcessor forEach=/*:DecisionServiceRs rootEntity=true logLevel=debug field column=event xpath=/dec:DecisionServiceRs/@event/ I am getting this error Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: null Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:321) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:278) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:53) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:283) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224) I see that the Clob is getting converted to String correctly and the log has this entry where xml is printed Exception while processing: input document : SolrInputDocument(fields: [RESPONSE_XML=dec:Deci I do not know why the error is thrown at Jdbc when the Clob is converted to string and passed to the FieldReader and do not know how to make this work. Thanks Pattabi
Re: Solr's suggester results
I'm using the FreeTextLookupFactory in my implementation now. Yes, now it can suggest part of the field from the middle of the content. I read that this implementation is able to consider the previous tokens when making the suggestions. However, when I try to enter a search phrase, it seems that it is only considering the last token and not any of the previous tokens. For example, when I search for http://localhost:8983/edm/collection1/suggest?suggest.q=trouble free, it is giving me suggestions based on the word 'free' only, and not 'trouble free'. This is my configuration: In solrconfig.xml: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=lookupImplFreeTextLookupFactory/str str name=indexPathsuggester_freetext_dir/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldSuggestion/str str name=suggestFreeTextAnalyzerFieldTypesuggestType/str str name=ngrams5/str str name=buildOnStartupfalse/str str name=buildOnCommitfalse/str /lst /searchComponent requestHandler name=/suggest class=solr.SearchHandler startup=lazy lst name=defaults str name=wtjson/str str name=indenttrue/str str name=suggesttrue/str str name=suggest.count10/str str name=suggest.dictionarymySuggester/str /lst arr name=components strsuggest/str /arr /requestHandler In schema.xml fieldType name=suggestType class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=5 outputUnigrams=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / /analyzer /fieldType Is there anything I configured wrongly? I've set the ngrams to 5, which means it is supposed to consider up to the previous 5 tokens entered? Regards, Edwin On 17 June 2015 at 22:12, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Edwin, The spellcheck is a thing, the Suggester is another. If you need to provide auto suggestion to your users, the suggester is the right thing to use. But I really doubt to be useful to select as a suggester field the entire content. it is going to be quite expensive. In the case I would again really suggest you to take a look to the article I quoted and Solr generic documentation. It is possible to suggest part of the field. You can use the FreeText suggester with a proper analysis selected. Cheers 2015-06-17 6:14 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Yes I've looked at that before, but I was told that the newer version of Solr has its own suggester, and does not need to use spellchecker anymore? So it's not necessary to use the spellechecker inside suggester anymore? Regards, Edwin On 17 June 2015 at 11:56, Erick Erickson erickerick...@gmail.com wrote: Have you looked at spellchecker? Because that sound much more like what you're asking about than suggester. Spell checking is more what you're asking for, have you even looked at that after it was suggested? bq: Also, when I do a search, it shouldn't be returning whole fields, but just to return a portion of the sentence This is what highlighting is built for. Really, I recommend you take the time to do some familiarization with the whole search space and Solr. The excellent book here: http://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021/ref=sr_1_1?ie=UTF8qid=1434513284sr=8-1keywords=apache+solrpebp=1434513287267perid=0YRK508J0HJ1N3BAX20E will give you the grounding you need to get the most out of Solr. Best, Erick On Tue, Jun 16, 2015 at 8:27 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: The long content is from when I tried to index PDF files. As some PDF files has alot of words in the content, it will lead to the *UTF8 encoding is longer than the max length 32766 error.* I think the problem is the content size of the PDF file exceed 32766 characters? I'm trying to accomplish to be able to index documents that can be of any size (even those with very large contents), and build the suggester from there. Also, when I do a search, it shouldn't be returning whole fields, but just to return a portion of the sentence. Regards, Edwin On 16 June 2015 at 23:02, Erick Erickson erickerick...@gmail.com wrote: The suggesters are built to return whole fields. You _might_ be able to add multiple fragments to a multiValued entry and get fragments, I haven't tried that though and I suspect that actually you'd get the same thing.. This is an XY problem IMO. Please describe exactly what you're trying to accomplish, with examples rather than continue to pursue this path. It sounds like you want spellcheck or
Re: adding XML data to SOLR index using DIH (xml-data-config)
There's no a-priori reason you should need to do this. What's your evidence here? What behaviors do you see when you try this? Details matter as Hoss would say. Give us an example of what changes in the XML file (and/or schema) you see that you think require re-indexing. Of course if you're adding new fields to schema.xml you need to reload (or restart) Solr. Best, Erick On Wed, Jun 17, 2015 at 12:06 PM, Morris, Paul E. pmor...@nsf.gov wrote: We regularly create a SOLR index from XML files, using the DIH with a suitably edited xml-data-config.xml. However, whenever new XML become available it seems like we have to rebuild the entire index again using the Data Import Handler. Are we missing something? Should it be possible to add new XML to the index using /dataimport with delta-import selected? Many thanks if anyone has been able to add new XML files to the SOLR index without reindexing everything again. Paul
Re: QueryParser to translate query arguments
On Wed, Jun 17, 2015 at 2:44 PM, Sreekant Sreedharan sreeka...@alamy.com wrote: I have a requirement to make SOLR a turnkey replacement for our legacy search engine. To do this, the queries supported by the legacy search engine has to be supported by SOLR. To do this, I have implemented a QueryParser. I've implemented it several ways: 1. I've copied the implementation in LuceneQParser, that uses the SolrQueryParser, and essentially replaces the params of my QParser replacing it with the an instance of the ModifiableSolrParams object. Taking care to copy what exists in the previous params object and replacing the 'fq' argument that is mapped from the query argument supported by the legacy search engine. The problem with this approach is that ModifiableSolrParams does not allow you to have multiple fq arguments in it. http://lucene.apache.org/solr/4_2_1/solr-solrj/org/apache/solr/common/params/ModifiableSolrParams.html#add%28java.lang.String,%20java.lang.String...%29 But in some cases, we need to support multiple field restrictions. I would have preferred this solution because I imagine that leveraging SOLR's robust query parsing mechanism is more easier than building a Lucene Query from scratch. 2. The second approach, uses a BooleanQuery and attempts to construct the entire query from the query parameters. This approach seemed more promising, and works for most field restrictions. But I hit a road block. The filter seems to work for all string fields. But when I declare a field as an integer field in my schema.xml config file, the search does not return the very same documents. I am not sure why? Integers are encoded differently rather than we print them, it's done via calling FieldType through Analyzer QueryBuilder.createFieldQuery(Analyzer, Occur, String, String, boolean, int) it's a way longer journey. I was wondering what the best approach to this problem is (either 1 or 2 above, or something even better). And I was wondering how to fix the problem in each of the above cases. -- View this message in context: http://lucene.472066.n3.nabble.com/QueryParser-to-translate-query-arguments-tp4212394.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: How to create concatenated token
Dear Erick, e.g. Solr training *Porter:-* solr train Position 1 2 *Concatenated :-* solr train solrtrain Position 1 2 I did implemented the filter as per my requirement. Thank you so much for your help and guidance. So how could I contribute it to the solr. With Regards Aman Tandon On Wed, Jun 17, 2015 at 10:14 AM, Aman Tandon amantandon...@gmail.com wrote: Hi Erick, Thank you so much, it will be helpful for me to learn how to save the state of token. I has no idea of how to save state of previous tokens due to this it was difficult to generate a concatenated token in the last. So is there anything should I read to learn more about it. With Regards Aman Tandon On Wed, Jun 17, 2015 at 9:20 AM, Erick Erickson erickerick...@gmail.com wrote: I really question the premise, but have a look at: https://issues.apache.org/jira/browse/SOLR-7193 Note that this is not committed and I haven't reviewed it so I don't have anything to say about that. And you'd have to implement it as a custom Filter. Best, Erick On Tue, Jun 16, 2015 at 5:55 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, Any guesses, how could I achieve this behaviour. With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon amantandon...@gmail.com wrote: e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) typo error e.g. Intent for solr training: fq=id:(234 456 545) title:(solr training) With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon amantandon...@gmail.com wrote: We has some business logic to search the user query in user intent or finding the exact matching products. e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) As we can see it is phrase query so it will took more time than the single stemmed token query. There are also 5-7 words phrase query. So we want to reduce the search time by implementing this feature. With Regards Aman Tandon On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can I ask you why you need to concatenate the tokens ? Maybe we can find a better solution to concat all the tokens in one single big token . I find it difficult to understand the reasons behind tokenising, token filtering and then un-tokenizing again :) It would be great if you explain a little bit better what you would like to do ! Cheers 2015-06-16 13:26 GMT+01:00 Aman Tandon amantandon...@gmail.com: Hi, I have a requirement to create the concatenated token of all the tokens created from the last item of my analyzer chain. *Suppose my analyzer chain is :* * tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.WordDelimiterFilterFactory catenateAll=1 splitOnNumerics=1 preserveOriginal=1/filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front /filter class=solr.PorterStemmerFilterFactory/* I want to create a concatenated token plugin to add at concatenated token along with the last token. e.g. Solr training *Porter:-* solr train Position 1 2 *Concatenated :-* solr train solrtrain Position 1 2 Please help me out. How to create custom filter for this requirement. With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
SolrCloud Docker environment
Hi all, maybe someone could be interested in this. I have created a suite of Docker images, dockerfiles and bash scripts useful to deploy a Zookeeper ensemble with 3 or more instances and a SolrCloud (v. 4 or 5) cluster. SolrCloud 4 cluster is based on Tomcat 7. https://github.com/freedev/solrcloud-zookeeper-docker This could be interesting for applications that needs a zookeeper/solrcloud cluster. Best regards, Vincenzo -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251
QueryParser to translate query arguments
I have a requirement to make SOLR a turnkey replacement for our legacy search engine. To do this, the queries supported by the legacy search engine has to be supported by SOLR. To do this, I have implemented a QueryParser. I've implemented it several ways: 1. I've copied the implementation in LuceneQParser, that uses the SolrQueryParser, and essentially replaces the params of my QParser replacing it with the an instance of the ModifiableSolrParams object. Taking care to copy what exists in the previous params object and replacing the 'fq' argument that is mapped from the query argument supported by the legacy search engine. The problem with this approach is that ModifiableSolrParams does not allow you to have multiple fq arguments in it. But in some cases, we need to support multiple field restrictions. I would have preferred this solution because I imagine that leveraging SOLR's robust query parsing mechanism is more easier than building a Lucene Query from scratch. 2. The second approach, uses a BooleanQuery and attempts to construct the entire query from the query parameters. This approach seemed more promising, and works for most field restrictions. But I hit a road block. The filter seems to work for all string fields. But when I declare a field as an integer field in my schema.xml config file, the search does not return the very same documents. I am not sure why? I was wondering what the best approach to this problem is (either 1 or 2 above, or something even better). And I was wondering how to fix the problem in each of the above cases. -- View this message in context: http://lucene.472066.n3.nabble.com/QueryParser-to-translate-query-arguments-tp4212394.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr/lucene index merge and optimize performance improvement
On Tue, 2015-06-16 at 09:54 -0700, Shenghua(Daniel) Wan wrote: Hi, Toke, Did you try MapReduce with solr? I think it should be a good fit for your use case. Thanks for the suggestion. Improved logistics, such as starting build of a new shard while the previous shard is optimizing, would work for us. Switching to a new controlling layer is not trivial, so the win by better utilization during the optimization phase is not enough in itself to pay the cost. - Toke Eskildsen, State and University Library, Denmark
Re: Indexing search list of Key/Value pairs
Hi, Found the best way to do it (for the ones which will read it in the future). Starting from Solr 4.8 nested documents can be used so for the document we can created child document with the key value as fields for each ley, using block join queries will close to loop and give the ability to search document with a nested document matching the query. Hope this will help. Thanks, Ami -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-search-list-of-Key-Value-pairs-tp4156206p4212357.html Sent from the Solr - User mailing list archive at Nabble.com.
Multivalued fields order of storing is guaranteed ?
Hello , I am using Solr 5.10 , I have a use case to fit in. Lets say I define 2 fields group-name,group-id both multivalued and stored . 1)now I add following values to each of them group-name {a,b,c} and group-id{1,2,3} . 2)Now I want to add new value to each of these 2 fields {d},{4} , my requirement is that it should add these new values such that when I query these 2 fields it should return me {a,b,c,d,} and {1,2,3,4} in this order i.e a=1,d=4. Is it guaranteed that stored multivalued fields maintain order of insertion. Or I need to to explicitly handle this scenario. Any help is appreciated. Thanks, Alok -- View this message in context: http://lucene.472066.n3.nabble.com/Multivalued-fields-order-of-storing-is-guaranteed-tp4212383.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to add a new child to existing document?
It doesn't work by design, remove write whole block. https://issues.apache.org/jira/browse/SOLR-6596 On Wed, Jun 17, 2015 at 11:44 AM, Maya G maiki...@gmail.com wrote: Hey, I'm trying to add a new child to an existing document. When I query for the child doc it doesn't return it . I'm using sole 4.105. Thank you, Maya -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-add-a-new-child-to-existing-document-tp4212365.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Dedupe in a SolrCloud
Hi, I am trying to use the dedupe feature to detect and mark near duplicate content in my collections. I dont want to prevent duplicate content. I woud like to detect it and keep it for further processing. Thats why Im using an extra field and not the documents unique field. Here is how I added it to the solrConfig.xml : requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainfill_signature/str /lst /requestHandler updateRequestProcessorChain name=fill_signature processor=signature processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain updateProcessor class=solr.processor.SignatureUpdateProcessorFactory name=signature bool name=enabledtrue/bool str name=signatureFieldsignature/str bool name=overwriteDupesfalse/bool str name=fieldscontent/str str name=signatureClasssolr.processor.TextProfileSignature/str str name=quantRate.2/str str name=minTokenLen3/str /updateProcessor When I initially add the documents to the cloud everything works as expected . the documents are added and the signature will be created and added.perfect:) The problem occours when I want to update an exisiting document. In that case the update.chain=fill_signature parameter will of course be set too and I get a bad request error. I found this solr issue: https://issues.apache.org/jira/browse/SOLR-3473 Is it that problem I am running into? Is it somehow possible to add parameters or set a specific update Handler when Im adding documents to the cloud using solrJ? In that case I could ether set the update.chain manually and remove it from the request handler or write a second request Handler which I only use if I want set the signature field. I know I can do that manually when Im using eg curl but is it also possible with SolrJ? :) Thanks, Markus
ZooKeeper connection refused
Hi. I have a SolrCloud cluster with 3 nodes Solr + Zookeeper. My solr.in.sh file is configured as following: ZK_HOST=zk1,zk2,zk3 All worked good but now I cannot start SOLR nodes and the command exit with the following errors: root@index1:~# service solr restart Sending stop command to Solr running on port 8983 ... waiting 5 seconds to allow Jetty process 32087 to stop gracefully. Waiting to see Solr listening on port 8983 [\] Still not seeing Solr listening on 8983 after 30 seconds! WARN - 2015-06-17 10:18:37.158; [ ] org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) WARN - 2015-06-17 10:18:37.823; [ ] org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) WARN - 2015-06-17 10:18:38.990; [ ] org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) WARN - 2015-06-17 10:18:40.543; [ ] org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) WARN - 2015-06-17 10:18:42.174; [ ] org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) I can telnet to the ZooKeeper port: root@index1:~# telnet zk1 2181 Trying 192.168.70.31... Connected to index1.dc.my.network. Escape character is '^]'. Could you help me please? Thank you very much! Bye
How to add a new child to existing document?
Hey, I'm trying to add a new child to an existing document. When I query for the child doc it doesn't return it . I'm using sole 4.105. Thank you, Maya -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-add-a-new-child-to-existing-document-tp4212365.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: suggester returning stems instead of whole words
ah looks like I need to use copyField to get a non stemmed version of the suggester field Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote: I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory/ str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h
suggester returning stems instead of whole words
I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory/str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h
Re: ZooKeeper connection refused
You are asking telnet to connect to zk1 on port 2181 but you have not specified the port to Solr. You should set ZK_HOST=zk1:2181,zk2:2181,zk3:2181 instead. On Wed, Jun 17, 2015 at 3:53 PM, shacky shack...@gmail.com wrote: Hi. I have a SolrCloud cluster with 3 nodes Solr + Zookeeper. My solr.in.sh file is configured as following: ZK_HOST=zk1,zk2,zk3 All worked good but now I cannot start SOLR nodes and the command exit with the following errors: root@index1:~# service solr restart Sending stop command to Solr running on port 8983 ... waiting 5 seconds to allow Jetty process 32087 to stop gracefully. Waiting to see Solr listening on port 8983 [\] Still not seeing Solr listening on 8983 after 30 seconds! WARN - 2015-06-17 10:18:37.158; [ ] org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) WARN - 2015-06-17 10:18:37.823; [ ] org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) WARN - 2015-06-17 10:18:38.990; [ ] org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) WARN - 2015-06-17 10:18:40.543; [ ] org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) WARN - 2015-06-17 10:18:42.174; [ ] org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) I can telnet to the ZooKeeper port: root@index1:~# telnet zk1 2181 Trying 192.168.70.31... Connected to index1.dc.my.network. Escape character is '^]'. Could you help me please? Thank you very much! Bye -- Regards, Shalin Shekhar Mangar.
Re: suggester returning stems instead of whole words
Did you change the SpellCheckComponent's configuration to use subject_autocomplete instead of dc.subject? After you made that change, did you invoke spellcheck.build=true to re-build the spellcheck index? On Wed, Jun 17, 2015 at 7:06 PM, Alistair Young alistair.yo...@uhi.ac.uk wrote: copyField doesn¹t seem to fix the suggestion stemming. Copying the field to another field of this type: field name=subject_autocomplete type=text_auto indexed=true stored=true multiValued=false / copyField source=dc.subject dest=subject_autocomplete / fieldType class=solr.TextField name=text_auto positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType but I¹m still getting stemmed suggestions after rebuilding the index. Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk wrote: ah looks like I need to use copyField to get a non stemmed version of the suggester field Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote: I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory / str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h -- Regards, Shalin Shekhar Mangar.
Re: suggester returning stems instead of whole words
copyField doesn¹t seem to fix the suggestion stemming. Copying the field to another field of this type: field name=subject_autocomplete type=text_auto indexed=true stored=true multiValued=false / copyField source=dc.subject dest=subject_autocomplete / fieldType class=solr.TextField name=text_auto positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType but I¹m still getting stemmed suggestions after rebuilding the index. Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk wrote: ah looks like I need to use copyField to get a non stemmed version of the suggester field Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote: I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory / str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h
Re: How to create concatenated token
If you used the JIRA I linked, vote for it, add any improvements etc. Anyone can attach a patch to a JIRA, you just have to create a login. That said, this may be too rare a use-case to deal with. I just thought of shingling which I should have suggested before that will work for concatenating small numbers of tokens which, I'd guess, is the case here. I mean do you really want to concatenate 50 tokens? Best, Erick On Wed, Jun 17, 2015 at 12:07 AM, Aman Tandon amantandon...@gmail.com wrote: Dear Erick, e.g. Solr training *Porter:-* solr train Position 1 2 *Concatenated :-* solr train solrtrain Position 1 2 I did implemented the filter as per my requirement. Thank you so much for your help and guidance. So how could I contribute it to the solr. With Regards Aman Tandon On Wed, Jun 17, 2015 at 10:14 AM, Aman Tandon amantandon...@gmail.com wrote: Hi Erick, Thank you so much, it will be helpful for me to learn how to save the state of token. I has no idea of how to save state of previous tokens due to this it was difficult to generate a concatenated token in the last. So is there anything should I read to learn more about it. With Regards Aman Tandon On Wed, Jun 17, 2015 at 9:20 AM, Erick Erickson erickerick...@gmail.com wrote: I really question the premise, but have a look at: https://issues.apache.org/jira/browse/SOLR-7193 Note that this is not committed and I haven't reviewed it so I don't have anything to say about that. And you'd have to implement it as a custom Filter. Best, Erick On Tue, Jun 16, 2015 at 5:55 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, Any guesses, how could I achieve this behaviour. With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon amantandon...@gmail.com wrote: e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) typo error e.g. Intent for solr training: fq=id:(234 456 545) title:(solr training) With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon amantandon...@gmail.com wrote: We has some business logic to search the user query in user intent or finding the exact matching products. e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) As we can see it is phrase query so it will took more time than the single stemmed token query. There are also 5-7 words phrase query. So we want to reduce the search time by implementing this feature. With Regards Aman Tandon On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can I ask you why you need to concatenate the tokens ? Maybe we can find a better solution to concat all the tokens in one single big token . I find it difficult to understand the reasons behind tokenising, token filtering and then un-tokenizing again :) It would be great if you explain a little bit better what you would like to do ! Cheers 2015-06-16 13:26 GMT+01:00 Aman Tandon amantandon...@gmail.com: Hi, I have a requirement to create the concatenated token of all the tokens created from the last item of my analyzer chain. *Suppose my analyzer chain is :* * tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.WordDelimiterFilterFactory catenateAll=1 splitOnNumerics=1 preserveOriginal=1/filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front /filter class=solr.PorterStemmerFilterFactory/* I want to create a concatenated token plugin to add at concatenated token along with the last token. e.g. Solr training *Porter:-* solr train Position 1 2 *Concatenated :-* solr train solrtrain Position 1 2 Please help me out. How to create custom filter for this requirement. With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: suggester returning stems instead of whole words
yep did both of those things. Getting the same results as using dc.subject On 17/06/2015 14:44, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Did you change the SpellCheckComponent's configuration to use subject_autocomplete instead of dc.subject? After you made that change, did you invoke spellcheck.build=true to re-build the spellcheck index? On Wed, Jun 17, 2015 at 7:06 PM, Alistair Young alistair.yo...@uhi.ac.uk wrote: copyField doesn¹t seem to fix the suggestion stemming. Copying the field to another field of this type: field name=subject_autocomplete type=text_auto indexed=true stored=true multiValued=false / copyField source=dc.subject dest=subject_autocomplete / fieldType class=solr.TextField name=text_auto positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType but I¹m still getting stemmed suggestions after rebuilding the index. Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk wrote: ah looks like I need to use copyField to get a non stemmed version of the suggester field Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote: I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactor y / str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h -- Regards, Shalin Shekhar Mangar.
Re: Dedupe in a SolrCloud
Comments inline: On Wed, Jun 17, 2015 at 3:18 PM, Markus.Mirsberger markus.mirsber...@gmx.de wrote: Hi, I am trying to use the dedupe feature to detect and mark near duplicate content in my collections. I dont want to prevent duplicate content. I woud like to detect it and keep it for further processing. Thats why Im using an extra field and not the documents unique field. Here is how I added it to the solrConfig.xml : requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainfill_signature/str /lst /requestHandler updateRequestProcessorChain name=fill_signature processor=signature processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain updateProcessor class=solr.processor.SignatureUpdateProcessorFactory name=signature bool name=enabledtrue/bool str name=signatureFieldsignature/str bool name=overwriteDupesfalse/bool str name=fieldscontent/str str name=signatureClasssolr.processor.TextProfileSignature/str str name=quantRate.2/str str name=minTokenLen3/str /updateProcessor When I initially add the documents to the cloud everything works as expected . the documents are added and the signature will be created and added.perfect:) The problem occours when I want to update an exisiting document. In that case the update.chain=fill_signature parameter will of course be set too and I get a bad request error. I found this solr issue: https://issues.apache.org/jira/browse/SOLR-3473 Is it that problem I am running into? You haven't pasted the complete error response so I am guessing a bit here. It is possible that you are running into the same problem i.e. the signature is being calculated again and the signature field not multi-valued, causes an error. Is it somehow possible to add parameters or set a specific update Handler when Im adding documents to the cloud using solrJ? Yes, any custom parameter can be added to a SolrJ request. There is a setParam(String param, String value) method available in AbstractUpdateRequest which can be used to set a custom update.chain for each SolrJ request. In that case I could ether set the update.chain manually and remove it from the request handler or write a second request Handler which I only use if I want set the signature field. I know I can do that manually when Im using eg curl but is it also possible with SolrJ? :) Thanks, Markus -- Regards, Shalin Shekhar Mangar.
Re: Multivalued fields order of storing is guaranteed ?
Thanks Yonik. -- View this message in context: http://lucene.472066.n3.nabble.com/Multivalued-fields-order-of-storing-is-guaranteed-tp4212383p4212428.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multivalued fields order of storing is guaranteed ?
On Wed, Jun 17, 2015 at 6:44 AM, Alok Bhandari alokomprakashbhand...@gmail.com wrote: Is it guaranteed that stored multivalued fields maintain order of insertion. Yes. -Yonik
Re: suggester returning stems instead of whole words
Hmmm, shouldn't be happening that way. Spellcheck is supposed to be looking at indexed terms. If you go into the admin/schema browser page and look at the new field, what are the terms in the index? They shouldn't be stemmed. And I always get confused where this str name=spellcheck.dictionarysuggest/str is supposed to point. Do you have any other component named suggest that you might be picking up? Best, Erick On Wed, Jun 17, 2015 at 6:50 AM, Alistair Young alistair.yo...@uhi.ac.uk wrote: yep did both of those things. Getting the same results as using dc.subject On 17/06/2015 14:44, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Did you change the SpellCheckComponent's configuration to use subject_autocomplete instead of dc.subject? After you made that change, did you invoke spellcheck.build=true to re-build the spellcheck index? On Wed, Jun 17, 2015 at 7:06 PM, Alistair Young alistair.yo...@uhi.ac.uk wrote: copyField doesn¹t seem to fix the suggestion stemming. Copying the field to another field of this type: field name=subject_autocomplete type=text_auto indexed=true stored=true multiValued=false / copyField source=dc.subject dest=subject_autocomplete / fieldType class=solr.TextField name=text_auto positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType but I¹m still getting stemmed suggestions after rebuilding the index. Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk wrote: ah looks like I need to use copyField to get a non stemmed version of the suggester field Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote: I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactor y / str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h -- Regards, Shalin Shekhar Mangar.
Re: ZooKeeper connection refused
Is ZK healthy? Can you try the following from the server on which Solr is running: echo ruok | nc zk1 2181 On Wed, Jun 17, 2015 at 7:25 PM, shacky shack...@gmail.com wrote: 2015-06-17 15:34 GMT+02:00 Shalin Shekhar Mangar shalinman...@gmail.com: You are asking telnet to connect to zk1 on port 2181 but you have not specified the port to Solr. You should set ZK_HOST=zk1:2181,zk2:2181,zk3:2181 instead. I modified the ZK_HOST instance with the port, but the problem is not solved. Do you have any ideas? -- Regards, Shalin Shekhar Mangar.
Re: suggester returning stems instead of whole words
looking at the schema browser, subject_autocomplete has a type of text_en rather than text_auto and all the terms are stemmed. Its contents are the same as the one it was copied from, dc.subject, which is text_en and stemmed. On 17/06/2015 14:58, Erick Erickson erickerick...@gmail.com wrote: Hmmm, shouldn't be happening that way. Spellcheck is supposed to be looking at indexed terms. If you go into the admin/schema browser page and look at the new field, what are the terms in the index? They shouldn't be stemmed. And I always get confused where this str name=spellcheck.dictionarysuggest/str is supposed to point. Do you have any other component named suggest that you might be picking up? Best, Erick On Wed, Jun 17, 2015 at 6:50 AM, Alistair Young alistair.yo...@uhi.ac.uk wrote: yep did both of those things. Getting the same results as using dc.subject On 17/06/2015 14:44, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Did you change the SpellCheckComponent's configuration to use subject_autocomplete instead of dc.subject? After you made that change, did you invoke spellcheck.build=true to re-build the spellcheck index? On Wed, Jun 17, 2015 at 7:06 PM, Alistair Young alistair.yo...@uhi.ac.uk wrote: copyField doesn¹t seem to fix the suggestion stemming. Copying the field to another field of this type: field name=subject_autocomplete type=text_auto indexed=true stored=true multiValued=false / copyField source=dc.subject dest=subject_autocomplete / fieldType class=solr.TextField name=text_auto positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType but I¹m still getting stemmed suggestions after rebuilding the index. Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk wrote: ah looks like I need to use copyField to get a non stemmed version of the suggester field Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote: I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFact or y / str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h -- Regards, Shalin Shekhar Mangar.
ManagedStopFilterFactory not accepting ignoreCase
We're running Solr 4.10.4 and getting this... Caused by: java.lang.IllegalArgumentException: Unknown parameters: {ignoreCase=true} at org.apache.solr.rest.schema.analysis.BaseManagedTokenFilterFactory.init(BaseManagedTokenFilterFactory.java:46) at org.apache.solr.rest.schema.analysis.ManagedStopFilterFactory.init(ManagedStopFilterFactory.java:47) This is the filter definition I used: filter class=solr.ManagedStopFilterFactory ignoreCase=true managed=english/ Any ideas? Thanks, Mike
Re: Solr's suggester results
Edwin, The spellcheck is a thing, the Suggester is another. If you need to provide auto suggestion to your users, the suggester is the right thing to use. But I really doubt to be useful to select as a suggester field the entire content. it is going to be quite expensive. In the case I would again really suggest you to take a look to the article I quoted and Solr generic documentation. It is possible to suggest part of the field. You can use the FreeText suggester with a proper analysis selected. Cheers 2015-06-17 6:14 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Yes I've looked at that before, but I was told that the newer version of Solr has its own suggester, and does not need to use spellchecker anymore? So it's not necessary to use the spellechecker inside suggester anymore? Regards, Edwin On 17 June 2015 at 11:56, Erick Erickson erickerick...@gmail.com wrote: Have you looked at spellchecker? Because that sound much more like what you're asking about than suggester. Spell checking is more what you're asking for, have you even looked at that after it was suggested? bq: Also, when I do a search, it shouldn't be returning whole fields, but just to return a portion of the sentence This is what highlighting is built for. Really, I recommend you take the time to do some familiarization with the whole search space and Solr. The excellent book here: http://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021/ref=sr_1_1?ie=UTF8qid=1434513284sr=8-1keywords=apache+solrpebp=1434513287267perid=0YRK508J0HJ1N3BAX20E will give you the grounding you need to get the most out of Solr. Best, Erick On Tue, Jun 16, 2015 at 8:27 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: The long content is from when I tried to index PDF files. As some PDF files has alot of words in the content, it will lead to the *UTF8 encoding is longer than the max length 32766 error.* I think the problem is the content size of the PDF file exceed 32766 characters? I'm trying to accomplish to be able to index documents that can be of any size (even those with very large contents), and build the suggester from there. Also, when I do a search, it shouldn't be returning whole fields, but just to return a portion of the sentence. Regards, Edwin On 16 June 2015 at 23:02, Erick Erickson erickerick...@gmail.com wrote: The suggesters are built to return whole fields. You _might_ be able to add multiple fragments to a multiValued entry and get fragments, I haven't tried that though and I suspect that actually you'd get the same thing.. This is an XY problem IMO. Please describe exactly what you're trying to accomplish, with examples rather than continue to pursue this path. It sounds like you want spellcheck or similar. The _point_ behind the suggesters is that they handle multiple-word suggestions by returning he whole field. So putting long text fields into them is not going to work. Best, Erick On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: in line : 2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com : Thanks Benedetti, I've change to the AnalyzingInfixLookup approach, and it is able to start searching from the middle of the field. However, is it possible to make the suggester to show only part of the content of the field (like 2 or 3 fields after), instead of the entire content/sentence, which can be quite long? I assume you use fields in the place of tokens. The answer is yes, I already said that in my previous mail, I invite you to read carefully the answers and the documentation linked ! Related the excessive dimensions of tokens. This is weird, what are you trying to autocomplete ? I really doubt would be useful for a user to see super long auto completed terms. Cheers Regards, Edwin On 15 June 2015 at 17:33, Alessandro Benedetti benedetti.ale...@gmail.com wrote: ehehe Edwin, I think you should read again the document I linked time ago : http://lucidworks.com/blog/solr-suggester/ The suggester you used is not meant to provide infix suggestions. The fuzzy suggester is working on a fuzzy basis , with the *starting* terms of a field content. What you are looking for is actually one of the Infix Suggesters. For example the AnalyzingInfixLookup approach. When working with Suggesters is important first to make a distinction : 1) Returning the full content of the field ( analysisInfix or Fuzzy) 2) Returning token(s) ( Free Text Suggester) Then the second difference is : 1) Infix suggestions ( from the middle of the field
Re: ZooKeeper connection refused
2015-06-17 15:34 GMT+02:00 Shalin Shekhar Mangar shalinman...@gmail.com: You are asking telnet to connect to zk1 on port 2181 but you have not specified the port to Solr. You should set ZK_HOST=zk1:2181,zk2:2181,zk3:2181 instead. I modified the ZK_HOST instance with the port, but the problem is not solved. Do you have any ideas?
Re: mapreduce job using soirj 5
I think there is some better classpath isolation options in the works for Hadoop. As it is, there is some harmonization that has to be done depending on versions used, and it can get tricky. - Mark On Wed, Jun 17, 2015 at 9:52 AM Erick Erickson erickerick...@gmail.com wrote: For sure there are a few rough edges here On Wed, Jun 17, 2015 at 12:28 AM, adfel70 adfe...@gmail.com wrote: We cannot downgrade httpclient in solrj5 because its using new features and we dont want to start altering solr code, anyway we thought about upgrading httpclient in hadoop but as Erick said its sounds more work than just put the jar in the data nodes. About that flag we tried it, hadoop even has an environment variable HADOOP_USER_CLASSPATH_FIRST but all our tests with that flag failed. We thought this is an issue that is more likely that solr users will encounter rather than cloudera users, so we will be glad for a more elegant solution or workaround than to replace the httpclient jar in the data nodes Thank you all for your responses -- View this message in context: http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199p4212350.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Mark about.me/markrmiller
Re: mapreduce job using soirj 5
For sure there are a few rough edges here On Wed, Jun 17, 2015 at 12:28 AM, adfel70 adfe...@gmail.com wrote: We cannot downgrade httpclient in solrj5 because its using new features and we dont want to start altering solr code, anyway we thought about upgrading httpclient in hadoop but as Erick said its sounds more work than just put the jar in the data nodes. About that flag we tried it, hadoop even has an environment variable HADOOP_USER_CLASSPATH_FIRST but all our tests with that flag failed. We thought this is an issue that is more likely that solr users will encounter rather than cloudera users, so we will be glad for a more elegant solution or workaround than to replace the httpclient jar in the data nodes Thank you all for your responses -- View this message in context: http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199p4212350.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Joins with comma separated values
You can potentially just use a text_general field, in which case your comma separated words will be effectively a multi-valued field. I believe that will work. As to how you want to use joins, that isn't possible. They are pseudo joins, not full joins. They will not be able to include data from the joined field in the result. Upayavira On jun 6, Advait Suhas Pandit wrote: Hi, We have some master data and some content data. Master data would be things like userid, name, email id etc. Our content data for example is a blog. The blog has certain fields which are comma separated ids that point to the master data. E.g. UserIDs of people who have commented on a particular blog can be found in the blog table in a comma separated field of userids. Similarly userids of people who have liked the blog can be found in a comma separated field of userids. How do I join this comma separated list of userids with the master data so that I can get the other details of the user such as name, email, picture etc? Thanks, Advait
Re: mapreduce job using soirj 5
We cannot downgrade httpclient in solrj5 because its using new features and we dont want to start altering solr code, anyway we thought about upgrading httpclient in hadoop but as Erick said its sounds more work than just put the jar in the data nodes. About that flag we tried it, hadoop even has an environment variable HADOOP_USER_CLASSPATH_FIRST but all our tests with that flag failed. We thought this is an issue that is more likely that solr users will encounter rather than cloudera users, so we will be glad for a more elegant solution or workaround than to replace the httpclient jar in the data nodes Thank you all for your responses -- View this message in context: http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199p4212350.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr/ZK issues
Hi Folks, We are seeing the following in our logs on our Solr nodes after which Solr nodes go into multiple full GCs and eventually runs out of heap. We saw this ticket - https://issues.apache.org/jira/browse/SOLR-7338 - wondering that’s the one causing it. We are currently on 4.10.0 INFO - 2015-06-17 08:06:28.163; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@422f41e9 name:ZooKeeperConnection Watcher:got event WatchedEvent state:Expired type:None path:null path:null type:None INFO - 2015-06-17 08:06:28.163; org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper... INFO - 2015-06-17 08:06:28.166; org.apache.solr.common.cloud.DefaultConnectionStrategy; Connection expired - starting a new one... INFO - 2015-06-17 08:06:28.171; org.apache.solr.common.cloud.ConnectionManager; Waiting for client to connect to ZooKeeper INFO - 2015-06-17 08:06:28.177; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@422f41e9 name:ZooKeeperConnection Watcher: got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2015-06-17 08:06:28.177; org.apache.solr.common.cloud.ConnectionManager; Client is connected to ZooKeeper INFO - 2015-06-17 08:06:28.178; org.apache.solr.common.cloud.ConnectionManager$1; Connection with ZooKeeper reestablished. INFO - 2015-06-17 08:06:28.178; org.apache.solr.common.cloud.DefaultConnectionStrategy; Reconnected to ZooKeeper INFO - 2015-06-17 08:06:28.179; org.apache.solr.common.cloud.ConnectionManager; Connected:true WARN - 2015-06-17 08:06:28.179; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=category coreNodeName=core_node2 WARN - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=category_shadow coreNodeName=core_node2 WARN - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=rules_shadow coreNodeName=core_node2 WARN - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=rules coreNodeName=core_node2 WARN - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=catalog_shadow coreNodeName=core_node2 WARN - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=catalog coreNodeName=core_node2 INFO - 2015-06-17 08:06:28.180; org.apache.solr.cloud.ZkController; publishing core=category state=down collection=category INFO - 2015-06-17 08:06:28.180; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property INFO - 2015-06-17 08:06:28.186; org.apache.solr.cloud.ZkController; publishing core=category_shadow state=down collection=category_shadow INFO - 2015-06-17 08:06:28.186; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property INFO - 2015-06-17 08:06:28.189; org.apache.solr.cloud.ZkController; publishing core=rules_shadow state=down collection=rules_shadow INFO - 2015-06-17 08:06:28.189; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property INFO - 2015-06-17 08:06:28.191; org.apache.solr.cloud.ZkController; publishing core=rules state=down collection=rules INFO - 2015-06-17 08:06:28.191; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property INFO - 2015-06-17 08:06:28.193; org.apache.solr.cloud.ZkController; publishing core=catalog_shadow state=down collection=catalog_shadow INFO - 2015-06-17 08:06:28.193; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property INFO - 2015-06-17 08:06:28.194; org.apache.solr.cloud.ZkController; publishing core=catalog state=down collection=catalog INFO - 2015-06-17 08:06:28.194; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property INFO - 2015-06-17 08:06:28.198; org.apache.solr.cloud.ZkController; Replica core_node2 NOT in leader-initiated recovery, need to wait for leader to see down state. o wait for leader to see down state. WARN - 2015-06-17 08:07:51.188; org.apache.solr.cloud.ZkController; org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections/rules_shadow/leader_elect/shard1/election at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:290) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:287)
Re: suggester returning stems instead of whole words
working in a tiny tmux window does have some disadvantages, such as losing one’s place in the file! the subject_autocomplete definition wasn’t inside fields. Now that it is, everything is working. thanks for listening Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 15:17, Alistair Young alistair.yo...@uhi.ac.uk wrote: looking at the schema browser, subject_autocomplete has a type of text_en rather than text_auto and all the terms are stemmed. Its contents are the same as the one it was copied from, dc.subject, which is text_en and stemmed. On 17/06/2015 14:58, Erick Erickson erickerick...@gmail.com wrote: Hmmm, shouldn't be happening that way. Spellcheck is supposed to be looking at indexed terms. If you go into the admin/schema browser page and look at the new field, what are the terms in the index? They shouldn't be stemmed. And I always get confused where this str name=spellcheck.dictionarysuggest/str is supposed to point. Do you have any other component named suggest that you might be picking up? Best, Erick On Wed, Jun 17, 2015 at 6:50 AM, Alistair Young alistair.yo...@uhi.ac.uk wrote: yep did both of those things. Getting the same results as using dc.subject On 17/06/2015 14:44, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Did you change the SpellCheckComponent's configuration to use subject_autocomplete instead of dc.subject? After you made that change, did you invoke spellcheck.build=true to re-build the spellcheck index? On Wed, Jun 17, 2015 at 7:06 PM, Alistair Young alistair.yo...@uhi.ac.uk wrote: copyField doesn¹t seem to fix the suggestion stemming. Copying the field to another field of this type: field name=subject_autocomplete type=text_auto indexed=true stored=true multiValued=false / copyField source=dc.subject dest=subject_autocomplete / fieldType class=solr.TextField name=text_auto positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType but I¹m still getting stemmed suggestions after rebuilding the index. Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk wrote: ah looks like I need to use copyField to get a non stemmed version of the suggester field Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote: I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFac t or y / str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h -- Regards, Shalin Shekhar Mangar.
Re: ManagedStopFilterFactory not accepting ignoreCase
Oh, I see you already did :) - thanks. - Steve On Jun 17, 2015, at 11:10 AM, Steve Rowe sar...@gmail.com wrote: Hi Mike, Looks like a bug to me - would you please create a JIRA? Thanks, Steve On Jun 17, 2015, at 10:29 AM, Mike Thomsen mikerthom...@gmail.com wrote: We're running Solr 4.10.4 and getting this... Caused by: java.lang.IllegalArgumentException: Unknown parameters: {ignoreCase=true} at org.apache.solr.rest.schema.analysis.BaseManagedTokenFilterFactory.init(BaseManagedTokenFilterFactory.java:46) at org.apache.solr.rest.schema.analysis.ManagedStopFilterFactory.init(ManagedStopFilterFactory.java:47) This is the filter definition I used: filter class=solr.ManagedStopFilterFactory ignoreCase=true managed=english/ Any ideas? Thanks, Mike
Re: suggester returning stems instead of whole words
yep, 4.3.1. The API changed after that so it’s finding the time to rewrite the entire backend that uses it On 17/06/2015 16:55, Shalin Shekhar Mangar shalinman...@gmail.com wrote: You must be using an old version of Solr. Since Solr 4.8 and beyond, the fields and types tags have been deprecated and you can place the field and field type definitions anywhere in the schema.xml. See http://issues.apache.org/jira/browse/SOLR-5228 On Wed, Jun 17, 2015 at 9:09 PM, Alistair Young alistair.yo...@uhi.ac.uk wrote: working in a tiny tmux window does have some disadvantages, such as losing one’s place in the file! the subject_autocomplete definition wasn’t inside fields. Now that it is, everything is working. thanks for listening Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 15:17, Alistair Young alistair.yo...@uhi.ac.uk wrote: looking at the schema browser, subject_autocomplete has a type of text_en rather than text_auto and all the terms are stemmed. Its contents are the same as the one it was copied from, dc.subject, which is text_en and stemmed. On 17/06/2015 14:58, Erick Erickson erickerick...@gmail.com wrote: Hmmm, shouldn't be happening that way. Spellcheck is supposed to be looking at indexed terms. If you go into the admin/schema browser page and look at the new field, what are the terms in the index? They shouldn't be stemmed. And I always get confused where this str name=spellcheck.dictionarysuggest/str is supposed to point. Do you have any other component named suggest that you might be picking up? Best, Erick On Wed, Jun 17, 2015 at 6:50 AM, Alistair Young alistair.yo...@uhi.ac.uk wrote: yep did both of those things. Getting the same results as using dc.subject On 17/06/2015 14:44, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Did you change the SpellCheckComponent's configuration to use subject_autocomplete instead of dc.subject? After you made that change, did you invoke spellcheck.build=true to re-build the spellcheck index? On Wed, Jun 17, 2015 at 7:06 PM, Alistair Young alistair.yo...@uhi.ac.uk wrote: copyField doesn¹t seem to fix the suggestion stemming. Copying the field to another field of this type: field name=subject_autocomplete type=text_auto indexed=true stored=true multiValued=false / copyField source=dc.subject dest=subject_autocomplete / fieldType class=solr.TextField name=text_auto positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType but I¹m still getting stemmed suggestions after rebuilding the index. Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk wrote: ah looks like I need to use copyField to get a non stemmed version of the suggester field Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote: I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupF ac t or y / str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: ManagedStopFilterFactory not accepting ignoreCase
Hi Mike, Looks like a bug to me - would you please create a JIRA? Thanks, Steve On Jun 17, 2015, at 10:29 AM, Mike Thomsen mikerthom...@gmail.com wrote: We're running Solr 4.10.4 and getting this... Caused by: java.lang.IllegalArgumentException: Unknown parameters: {ignoreCase=true} at org.apache.solr.rest.schema.analysis.BaseManagedTokenFilterFactory.init(BaseManagedTokenFilterFactory.java:46) at org.apache.solr.rest.schema.analysis.ManagedStopFilterFactory.init(ManagedStopFilterFactory.java:47) This is the filter definition I used: filter class=solr.ManagedStopFilterFactory ignoreCase=true managed=english/ Any ideas? Thanks, Mike
Re: suggester returning stems instead of whole words
You must be using an old version of Solr. Since Solr 4.8 and beyond, the fields and types tags have been deprecated and you can place the field and field type definitions anywhere in the schema.xml. See http://issues.apache.org/jira/browse/SOLR-5228 On Wed, Jun 17, 2015 at 9:09 PM, Alistair Young alistair.yo...@uhi.ac.uk wrote: working in a tiny tmux window does have some disadvantages, such as losing one’s place in the file! the subject_autocomplete definition wasn’t inside fields. Now that it is, everything is working. thanks for listening Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 15:17, Alistair Young alistair.yo...@uhi.ac.uk wrote: looking at the schema browser, subject_autocomplete has a type of text_en rather than text_auto and all the terms are stemmed. Its contents are the same as the one it was copied from, dc.subject, which is text_en and stemmed. On 17/06/2015 14:58, Erick Erickson erickerick...@gmail.com wrote: Hmmm, shouldn't be happening that way. Spellcheck is supposed to be looking at indexed terms. If you go into the admin/schema browser page and look at the new field, what are the terms in the index? They shouldn't be stemmed. And I always get confused where this str name=spellcheck.dictionarysuggest/str is supposed to point. Do you have any other component named suggest that you might be picking up? Best, Erick On Wed, Jun 17, 2015 at 6:50 AM, Alistair Young alistair.yo...@uhi.ac.uk wrote: yep did both of those things. Getting the same results as using dc.subject On 17/06/2015 14:44, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Did you change the SpellCheckComponent's configuration to use subject_autocomplete instead of dc.subject? After you made that change, did you invoke spellcheck.build=true to re-build the spellcheck index? On Wed, Jun 17, 2015 at 7:06 PM, Alistair Young alistair.yo...@uhi.ac.uk wrote: copyField doesn¹t seem to fix the suggestion stemming. Copying the field to another field of this type: field name=subject_autocomplete type=text_auto indexed=true stored=true multiValued=false / copyField source=dc.subject dest=subject_autocomplete / fieldType class=solr.TextField name=text_auto positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType but I¹m still getting stemmed suggestions after rebuilding the index. Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk wrote: ah looks like I need to use copyField to get a non stemmed version of the suggester field Alistair -- mov eax,1 mov ebx,0 int 80h On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote: I was wondering if there's a way to get the suggester to return whole words. Instead of returning 'technology' , 'temperature' and 'tutorial', it's returning 'technolog' , 'temperatur' and 'tutori' using this config: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFac t or y / str str name=fielddc.subject/str float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler thanks, Alistair -- mov eax,1 mov ebx,0 int 80h -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.