Re: Data indexing is going too slow on single shard Why?
Okay. Thanks Shawn.. On Thu, Mar 26, 2015 at 12:25 PM, Shawn Heisey apa...@elyograg.org wrote: On 3/26/2015 12:03 AM, Nitin Solanki wrote: Great thanks Shawn... As you said - **For 204GB of data per server, I recommend at least 128GB of total RAM, preferably 256GB**. Therefore, if I have 204GB of data on single server/shard then I prefer is 256GB by which searching will be fast and never slow down. Is it? Obviously I cannot guarantee it, but I think it's extremely likely that with that much memory, performance will be very good. One other possibility, which is discussed on that wiki page I linked, is that your java heap is being almost exhausted and large amounts of time are spent in garbage collection. If you increase the heap from 4GB to 5GB and see performance get better, then that would be confirmed. There would be less memory available for caching, but constant garbage collection would be a much greater problem than the disk cache being too small. Thanks, Shawn
ZFS File System for SOLR 3.6 and SOLR 4
Hello, i am trying to use ZFS as filesystem for my Linux Environment. are there any performance implications of using any filesystem other than ext-3/ext-4 with SOLR? Thanks in Advance Best Regards, Abhishek
SOLR Index in shared/Network folder
Greetings, I am trying to use a network shared location as my index directory. are there any known problems in using a Network File System for running a SOLR Instance? Thanks in Advance. Best Regards, Abhishek
Re: SOLR Index in shared/Network folder
On 3/27/2015 12:06 AM, abhi Abhishek wrote: Greetings, I am trying to use a network shared location as my index directory. are there any known problems in using a Network File System for running a SOLR Instance? It is not recommended. You will probably need to change the lockType, ... the default native probably will not work, and you might need to change it to none to get it working ... but that disables an important safety mechanism that prevents index corruption. http://stackoverflow.com/questions/9599529/solr-over-nfs-problems Thanks, Shawn
Database vs Solr : ID based filtering
Hi, Does an ID based filtering on solr will perform poor than DB? field nameid typestring indexed=true stored=true - http://localhost:8983/solr/select?q=*fq=id:153 *OR* - select * from TABLE where id=153 With Regards Aman Tandon
Re: Database vs Solr : ID based filtering
so you’ll end up forever invalidating your cache. What if we have 1 million ids assigned to the different user and each user daily performs the query on solr. Then will it be there forever? With Regards Aman Tandon On Fri, Mar 27, 2015 at 1:50 PM, Upayavira u...@odoko.co.uk wrote: The below won’t perform well. You’ve used a filter query, which will be cached, so you’ll end up forever invalidating your cache. Better would be http://localhost:8983/solr/select?q=id:153 Perhaps better still would be http://localhost:8983/solr/get?id=153 The latter is a “real time get” which will return a document that hasn’t even been soft-committed yet. As to which performs better, I’d encourage you to set up a simple experiment, and try it out. Upayavira On Fri, Mar 27, 2015, at 06:56 AM, Aman Tandon wrote: Hi, Does an ID based filtering on solr will perform poor than DB? field nameid typestring indexed=true stored=true - http://localhost:8983/solr/select?q=*fq=id:153 *OR* - select * from TABLE where id=153 With Regards Aman Tandon
Re: SOLR 5.0.0 and Tomcat version ?
On 23/03/15 20:05, Erick Erickson wrote: you don't run a SQL engine from a servlet container, why should you run Solr that way? https://twitter.com/steff1193/status/580491034175660032 https://issues.apache.org/jira/browse/SOLR-7236?focusedCommentId=14383624page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14383624 etc Not that I want to start the discussion again. The war seems to be lost.
Tweaking SOLR memory and cull facet words
Hi, my SOLR 5 solrconfig.xml file contains the following lines: !-- Faceting defaults -- str name=faceton/str str name=facet.fieldtext/str str name=facet.mincount100/str where the 'text' field contains thousands of words. When I start SOLR, the search engine takes several minutes to index the words in the 'text' field (although loading the browse template later only takes a few seconds because the 'text' field has already been indexed). Here are my questions: - should I increase SOLR's JVM memory to make initial indexing faster? e.g., SOLR_JAVA_MEM=-Xms1024m -Xmx204800m in solr.in.sh - how can I cull facet words according to certain criteria (length, case, etc.)? For instance, my facets are the following: application (22427) inytapdf0 (22427) pdf (22427) the (22334) new (22131) herald (21983) york (21975) paris (21780) a (21692) and (21298) of (21288) i (21247) in (21062) to (20918) on (20899) m (20857) by (20733) de (20664) for (20580) at (20417) with (20371) ... Obviously, words such as the, i, to,m, etc. should not be indexed. Furthermore, I don't care about nouns. I am only interested in people and location names. Many thanks. Philippe
Re: Database vs Solr : ID based filtering
for the single where clause RDBMS with index performs comparable same as inverted index. Inverted index wins on multiple 'where' clauses, where it doesn't need composite indices; multivalue field is also its' intrinsic advantage. More details at http://www.slideshare.net/lucenerevolution/what-is-inaluceneagrandfinal On Fri, Mar 27, 2015 at 9:56 AM, Aman Tandon amantandon...@gmail.com wrote: Hi, Does an ID based filtering on solr will perform poor than DB? field nameid typestring indexed=true stored=true - http://localhost:8983/solr/select?q=*fq=id:153 *OR* - select * from TABLE where id=153 With Regards Aman Tandon -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Database vs Solr : ID based filtering
The below won’t perform well. You’ve used a filter query, which will be cached, so you’ll end up forever invalidating your cache. Better would be http://localhost:8983/solr/select?q=id:153 Perhaps better still would be http://localhost:8983/solr/get?id=153 The latter is a “real time get” which will return a document that hasn’t even been soft-committed yet. As to which performs better, I’d encourage you to set up a simple experiment, and try it out. Upayavira On Fri, Mar 27, 2015, at 06:56 AM, Aman Tandon wrote: Hi, Does an ID based filtering on solr will perform poor than DB? field nameid typestring indexed=true stored=true - http://localhost:8983/solr/select?q=*fq=id:153 *OR* - select * from TABLE where id=153 With Regards Aman Tandon
Re: Solr replicas going in recovering state during heavy indexing
I think it is very likely that it is due to Solr-nodes losing ZK-connections (after timeout). We have experienced that a lot. One thing you want to do, is to make sure your ZK-servers does not run on the same machines as your Solr-nodes - that helped us a lot. On 24/03/15 13:57, Gopal Jee wrote: Hi We have a large solrcloud cluster. We have observed that during heavy indexing, large number of replicas go to recovering or down state. What could be the possible reason and/or fix for the issue. Gopal
Unable to perform search query after changing uniqueKey
Hi everyone, I've changed my uniqueKey to another name, instead of using id, on the schema.xml. However, after I have done the indexing (the indexing is successful), I'm not able to perform a search query on it. I gives the error java.lang.NullPointerException. Is there other place which I need to configure, besides changing the uniqueKey field in scheam.xml? Regards, Edwin
Re: Unable to perform search query after changing uniqueKey
Hi Edwin, please provide some other detail about your context, (e.g. complete stacktrace, query you're issuing) Best, Andrea On 03/27/2015 09:38 AM, Zheng Lin Edwin Yeo wrote: Hi everyone, I've changed my uniqueKey to another name, instead of using id, on the schema.xml. However, after I have done the indexing (the indexing is successful), I'm not able to perform a search query on it. I gives the error java.lang.NullPointerException. Is there other place which I need to configure, besides changing the uniqueKey field in scheam.xml? Regards, Edwin
Re: Solr advanced StopFilterFactory
Alex - that’s definitely possible, with performance being the main consideration here. But since this is for query time stop words, maybe instead your fronting application could take the users list and remove those words from the query before sending it to Solr? I’m curious what the ultimate goal / use case is for this feature, which may help us better guide you on ways to do what you need. — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com http://www.lucidworks.com/ On Mar 27, 2015, at 8:32 AM, Alex Sylka sylkaa...@gmail.com wrote: We need advanced stop words filter in Solr. We need stopwords to be stored in db and ability to change them by users (each user should have own stopwords). That's why I am thinking about sending stop words to solr from our app or connect to our db from solr and use updated stop words in custom StopFilterFactory. Also each user will have own stopwords list which will be stored in mysql db stopwords table. (id, user_id, stopword). We have next index structure. This index will store data for all users. field name=user_id type=int indexed=true stored=true required=true multiValued=false / field name=tag_name type=text_general indexed=true stored=true required=false multiValued=false/ ... field name=tag_description type=text_general indexed=true stored=true required=false multiValued=false/ I am not sure how to achive behaviour described above but I am thinking about writing own custom StopFilterFactory which will grab stopwords from db and use different stopwords for users while indexing their documents. What you can suggest ? Is that possible ? Am I on right way ?
Re: Installing the auto-phrase-tokenfilter
Hi, I never used that but I think you should - get the source code / clone the repository - run the ant build (I see a dist target) - put the artifact in your core / shared lib dir so Solr can see that library - have a look at the README [1] for how to use that Best, Andrea [1] https://github.com/LucidWorks/auto-phrase-tokenfilter/blob/master/README.md On 03/27/2015 01:02 PM, afrooz wrote: I am also, can anyone help us? -- View this message in context: http://lucene.472066.n3.nabble.com/Installing-the-auto-phrase-tokenfilter-tp4195466p4195787.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr advanced StopFilterFactory
We need advanced stop words filter in Solr. We need stopwords to be stored in db and ability to change them by users (each user should have own stopwords). That's why I am thinking about sending stop words to solr from our app or connect to our db from solr and use updated stop words in custom StopFilterFactory. Also each user will have own stopwords list which will be stored in mysql db stopwords table. (id, user_id, stopword). We have next index structure. This index will store data for all users. field name=user_id type=int indexed=true stored=true required=true multiValued=false / field name=tag_name type=text_general indexed=true stored=true required=false multiValued=false/ ... field name=tag_description type=text_general indexed=true stored=true required=false multiValued=false/ I am not sure how to achive behaviour described above but I am thinking about writing own custom StopFilterFactory which will grab stopwords from db and use different stopwords for users while indexing their documents. What you can suggest ? Is that possible ? Am I on right way ?
Re: Installing the auto-phrase-tokenfilter
I am also, can anyone help us? -- View this message in context: http://lucene.472066.n3.nabble.com/Installing-the-auto-phrase-tokenfilter-tp4195466p4195787.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tweaking SOLR memory and cull facet words
On 3/27/2015 4:14 AM, phi...@free.fr wrote: Hi, my SOLR 5 solrconfig.xml file contains the following lines: !-- Faceting defaults -- str name=faceton/str str name=facet.fieldtext/str str name=facet.mincount100/str where the 'text' field contains thousands of words. When I start SOLR, the search engine takes several minutes to index the words in the 'text' field (although loading the browse template later only takes a few seconds because the 'text' field has already been indexed). Here are my questions: - should I increase SOLR's JVM memory to make initial indexing faster? e.g., SOLR_JAVA_MEM=-Xms1024m -Xmx204800m in solr.in.sh - how can I cull facet words according to certain criteria (length, case, etc.)? For instance, my facets are the following: application (22427) inytapdf0 (22427) pdf (22427) the (22334) new (22131) herald (21983) york (21975) paris (21780) a (21692) and (21298) of (21288) i (21247) in (21062) to (20918) on (20899) m (20857) by (20733) de (20664) for (20580) at (20417) with (20371) ... Obviously, words such as the, i, to,m, etc. should not be indexed. Furthermore, I don't care about nouns. I am only interested in people and location names. Starting Solr does not index anything, unless you are talking about one of the sidecar indexes for spelling correction or suggestions. You must send indexing requests to Solr, and if you are experiencing slow indexing, chances are that it's because of slowness in obtaining data from the source, not Solr ... or that you are indexing with a single thread. If you can set up multiple threads or processes that are indexing in parallel, it should go faster. Thousands of terms are not hard for Solr to handle at all. When the number of terms gets into the millions or billions, then it starts becoming a hard problem. If you use the stopword filter on the index analysis chain for the field that you are using for facets, then all the stopwords will be removed from the facets. That would change how searches work on the field, so you will probably want to use copyField to create a new field that you use for faceting. There are other filters that can do things you have mentioned, like LengthFilterFactory: https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory As far as java heap sizing, trial and error is about the only way to find the right size. http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap Thanks, Shawn
Re: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs)
On 3/27/2015 7:07 AM, Russell Taylor wrote: Hi Shawn, thanks for the quick reply. I've looked at both methods and I think that they won't work for a number of reasons: 1) uniqueKey: I could use the uniqueKey and overwrite the original document but I need to remove the documents which are not on my new input list and the issue with the uniqueKey method is I don't know what to delete. Documents on the index: docs: [ { id:1 keyField:A },{ id:2 keyField:A },{ id:3 keyField:B } ] New Documents to go on index docs: [ { id:1 keyField:A },{ id:3 keyField:B } ] I would never know that id:2 should be deleted. (on some new document lists the delete list could be in the millions). 2) openSearcher: My openSearcher is set to false and I've also commented out autoSoftCommit so I don't get a partial list being returned on a query. !-- autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:1000}/maxTime /autoSoftCommit -- So is there another way to keep the original set of documents until the new set has been added to the index? If you are 100% in control of when commits with openSearcher=true are sent, which it sounds like you probably are, then you can do anything you want from the start of indexing until commit time, and the user will never see any of it, until the commit happens. That allows the following relatively simple paradigm: 1) Delete LOTS of stuff, or perhaps everything in the index with a deleteByQuery of *:* (for all documents). 2) Index everything you need to index. 3) Commit. Thanks, Shawn
RE: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs)
Hi Shawn, thanks for the quick reply. I've looked at both methods and I think that they won't work for a number of reasons: 1) uniqueKey: I could use the uniqueKey and overwrite the original document but I need to remove the documents which are not on my new input list and the issue with the uniqueKey method is I don't know what to delete. Documents on the index: docs: [ { id:1 keyField:A },{ id:2 keyField:A },{ id:3 keyField:B } ] New Documents to go on index docs: [ { id:1 keyField:A },{ id:3 keyField:B } ] I would never know that id:2 should be deleted. (on some new document lists the delete list could be in the millions). 2) openSearcher: My openSearcher is set to false and I've also commented out autoSoftCommit so I don't get a partial list being returned on a query. !-- autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:1000}/maxTime /autoSoftCommit -- So is there another way to keep the original set of documents until the new set has been added to the index? Thanks Russ. -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: 26 March 2015 16:06 To: solr-user@lucene.apache.org Subject: Re: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs) On 3/26/2015 9:53 AM, Russell Taylor wrote: I have an index which is made up of groups of documents, each group is defined by a field called keyField (keyField:A). I need to delete all the keyField:A documents and replace them with a brand new set without the index ever returning zero documents on a query. At the moment I deleteByQuery:keyField:A and then insert a SolrInputDocument list via SolrJ into my index. I have a small time period where somebody doing a q=fieldKey:A can be returned an empty list. FYI: The keyField group might be just 100 documents or up to 10 million. As long as you don't have any commits with openSearcher=true happening between the delete and the insert, that would work ... but why go through the manual delete if you don't have to? If you define a suitable uniqueKey field in your schema, simply indexing a new document with the same value in the uniqueKeyfield as an existing document will delete the old document. https://wiki.apache.org/solr/UniqueKey Thanks, Shawn *** This message (including any files transmitted with it) may contain confidential and/or proprietary information, is the property of Interactive Data Corporation and/or its subsidiaries, and is directed only to the addressee(s). If you are not the designated recipient or have reason to believe you received this message in error, please delete this message from your system and notify the sender immediately. An unintended recipient's disclosure, copying, distribution, or use of this message or any attachments is prohibited and may be unlawful. ***
RE: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs)
Yes that works and now I have a better understanding of the soft and hard commits to boot. Thanks again Shawn. Russ. -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: 27 March 2015 13:22 To: solr-user@lucene.apache.org Subject: Re: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs) On 3/27/2015 7:07 AM, Russell Taylor wrote: Hi Shawn, thanks for the quick reply. I've looked at both methods and I think that they won't work for a number of reasons: 1) uniqueKey: I could use the uniqueKey and overwrite the original document but I need to remove the documents which are not on my new input list and the issue with the uniqueKey method is I don't know what to delete. Documents on the index: docs: [ { id:1 keyField:A },{ id:2 keyField:A },{ id:3 keyField:B } ] New Documents to go on index docs: [ { id:1 keyField:A },{ id:3 keyField:B } ] I would never know that id:2 should be deleted. (on some new document lists the delete list could be in the millions). 2) openSearcher: My openSearcher is set to false and I've also commented out autoSoftCommit so I don't get a partial list being returned on a query. !-- autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:1000}/maxTime /autoSoftCommit -- So is there another way to keep the original set of documents until the new set has been added to the index? If you are 100% in control of when commits with openSearcher=true are sent, which it sounds like you probably are, then you can do anything you want from the start of indexing until commit time, and the user will never see any of it, until the commit happens. That allows the following relatively simple paradigm: 1) Delete LOTS of stuff, or perhaps everything in the index with a deleteByQuery of *:* (for all documents). 2) Index everything you need to index. 3) Commit. Thanks, Shawn *** This message (including any files transmitted with it) may contain confidential and/or proprietary information, is the property of Interactive Data Corporation and/or its subsidiaries, and is directed only to the addressee(s). If you are not the designated recipient or have reason to believe you received this message in error, please delete this message from your system and notify the sender immediately. An unintended recipient's disclosure, copying, distribution, or use of this message or any attachments is prohibited and may be unlawful. ***
Re: Tweaking SOLR memory and cull facet words
On 3/27/2015 8:10 AM, phi...@free.fr wrote: You must send indexing requests to Solr, Are you referring to posting add/add queries to SOLR, or to something else? If you can set up multiple threads or processes... How do you do that? Yes, I am referring to posting requests to the /update handler. Since you would be writing the program, making it multithreaded or multi-process is up to you and the features of the language you are writing in. https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory Can you update the stopwords.txt file, and then re-index the documents? How? http://wiki.apache.org/solr/HowToReindex Thanks, Shawn
Re: Installing the auto-phrase-tokenfilter
Thanks, my main issue is that, I am a .net developer , but i need to use this class within solr and call it somehow in .net. The issue is that i want the jar file from this source code, as my searches I think i have to install Ant and run it within eclipse... I tried this with creating a jar file through the java command but it seems those jar files are not working fine while i am using them within Solr. I have a question, if there is 3 class within the source file, i need to have a jar file for each class or i should generate a jar all in one? and if within solr it called for class=com.lucidworks.analysis.AutoPhrasingTokenFilterFactory then what should be the name of my jar file? the name which is written in the build is Auto-Phrase-TokenFilter I am confuse, please explain it for me Thank you in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Installing-the-auto-phrase-tokenfilter-tp4195466p4195811.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tweaking SOLR memory and cull facet words
Hi Shawn, You must send indexing requests to Solr, Are you referring to posting add/add queries to SOLR, or to something else? If you can set up multiple threads or processes... How do you do that? https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory Can you update the stopwords.txt file, and then re-index the documents? How? Many thanks. Philippe - Mail original - De: Shawn Heisey apa...@elyograg.org À: solr-user@lucene.apache.org Envoyé: Vendredi 27 Mars 2015 14:38:20 Objet: Re: Tweaking SOLR memory and cull facet words On 3/27/2015 4:14 AM, phi...@free.fr wrote: Hi, my SOLR 5 solrconfig.xml file contains the following lines: !-- Faceting defaults -- str name=faceton/str str name=facet.fieldtext/str str name=facet.mincount100/str where the 'text' field contains thousands of words. When I start SOLR, the search engine takes several minutes to index the words in the 'text' field (although loading the browse template later only takes a few seconds because the 'text' field has already been indexed). Here are my questions: - should I increase SOLR's JVM memory to make initial indexing faster? e.g., SOLR_JAVA_MEM=-Xms1024m -Xmx204800m in solr.in.sh - how can I cull facet words according to certain criteria (length, case, etc.)? For instance, my facets are the following: application (22427) inytapdf0 (22427) pdf (22427) the (22334) new (22131) herald (21983) york (21975) paris (21780) a (21692) and (21298) of (21288) i (21247) in (21062) to (20918) on (20899) m (20857) by (20733) de (20664) for (20580) at (20417) with (20371) ... Obviously, words such as the, i, to,m, etc. should not be indexed. Furthermore, I don't care about nouns. I am only interested in people and location names. Starting Solr does not index anything, unless you are talking about one of the sidecar indexes for spelling correction or suggestions. You must send indexing requests to Solr, and if you are experiencing slow indexing, chances are that it's because of slowness in obtaining data from the source, not Solr ... or that you are indexing with a single thread. If you can set up multiple threads or processes that are indexing in parallel, it should go faster. Thousands of terms are not hard for Solr to handle at all. When the number of terms gets into the millions or billions, then it starts becoming a hard problem. If you use the stopword filter on the index analysis chain for the field that you are using for facets, then all the stopwords will be removed from the facets. That would change how searches work on the field, so you will probably want to use copyField to create a new field that you use for faceting. There are other filters that can do things you have mentioned, like LengthFilterFactory: https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory As far as java heap sizing, trial and error is about the only way to find the right size. http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap Thanks, Shawn
Re: ZFS File System for SOLR 3.6 and SOLR 4
On 3/27/2015 12:30 AM, abhi Abhishek wrote: i am trying to use ZFS as filesystem for my Linux Environment. are there any performance implications of using any filesystem other than ext-3/ext-4 with SOLR? That should work with no problem. The only time Solr tends to have problems is if you try to use a network filesystem. As long as it's a local filesystem and it implements everything a program can typically expect from a local filesystem, Solr should work perfectly. Because of the compatibility problems that the license for ZFS has with the GPL, ZFS on Linux is probably not as well tested as other filesystems like ext4, xfs, or btrfs, but I have not heard about any big problems, so it's probably safe. Thanks, Shawn
Re: Installing the auto-phrase-tokenfilter
On 3/27/2015 7:45 AM, afrooz wrote: my main issue is that, I am a .net developer , but i need to use this class within solr and call it somehow in .net. The issue is that i want the jar file from this source code, as my searches I think i have to install Ant and run it within eclipse... I tried this with creating a jar file through the java command but it seems those jar files are not working fine while i am using them within Solr. I have a question, if there is 3 class within the source file, i need to have a jar file for each class or i should generate a jar all in one? and if within solr it called for class=com.lucidworks.analysis.AutoPhrasingTokenFilterFactory then what should be the name of my jar file? the name which is written in the build is Auto-Phrase-TokenFilter I am confuse, please explain it for me This code is from LucidWorks, not the Solr project. You'll need to talk to them for help on it. One avenue is their issue tracker, but if they run their project like we run ours, they probably prefer that you ask on a mailing list or some other kind of support forum before you file an issue. I do not know where those resources might be. There are a number of LucidWorks employees on this mailing list, perhaps one of them might be able to direct you. https://github.com/LucidWorks/auto-phrase-tokenfilter/issues Thanks, Shawn
Re: SOLR Index in shared/Network folder
Several years ago, I accidentally put Solr indexes on an NFS volume and it was 100X slower. If you have enough RAM, query speed should be OK, but startup time (loading indexes into file buffers) could be really long. Indexing could be quite slow. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Mar 26, 2015, at 11:31 PM, Shawn Heisey apa...@elyograg.org wrote: On 3/27/2015 12:06 AM, abhi Abhishek wrote: Greetings, I am trying to use a network shared location as my index directory. are there any known problems in using a Network File System for running a SOLR Instance? It is not recommended. You will probably need to change the lockType, ... the default native probably will not work, and you might need to change it to none to get it working ... but that disables an important safety mechanism that prevents index corruption. http://stackoverflow.com/questions/9599529/solr-over-nfs-problems Thanks, Shawn
Re: Solr advanced StopFilterFactory
The main goal to allow each user use own stop words list. For example user type th now he will see next results in his terms search: the the one the then then then and But user has stop word the and he want get next results: then then and -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-advanced-StopFilterFactory-tp4195797p4195855.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can SOLR custom analyzer access another field's value?
You could pre-process the field values in an update processor. You can even write a snippet in JavaScript. You could check one field and then redirect a field to an alternate field which has a different analyzer. What expectations do you have as to what analysis should occur at query time? -- Jack Krupansky On Fri, Mar 27, 2015 at 12:22 PM, Alex Sylka sylkaa...@gmail.com wrote: I am trying to write a custom analyzer , whose execution is determined by the value of another field within the document. For example if the locale field in the document has 'de' as the value, then the analizer would use the German set of tokenizers/filters to process the value of a field. My question is : how can a custom analyzer access the value of another field (in this case locale field) within a document, while analyzing the value of a specific field? There is a solution where we can prepend the locale value to the field's value like de|fieldvalue then custom analyzer can extract the locale while analyzing the field value. This seems a dirty solution. Is there any better solution ?
Can SOLR custom analyzer access another field's value?
I am trying to write a custom analyzer , whose execution is determined by the value of another field within the document. For example if the locale field in the document has 'de' as the value, then the analizer would use the German set of tokenizers/filters to process the value of a field. My question is : how can a custom analyzer access the value of another field (in this case locale field) within a document, while analyzing the value of a specific field? There is a solution where we can prepend the locale value to the field's value like de|fieldvalue then custom analyzer can extract the locale while analyzing the field value. This seems a dirty solution. Is there any better solution ?
Re: solr server datetime
Why do you want to in the first place? I ask because it's a common trap to think the server time is something that is useful... That said, it would require a little fiddling, but you can return the number of milliseconds since January 1, 1970 (standard Unix epoch) by adding ms(NOW) to your fl parameter. The general case here is that you can add the results of any function query to the fl list. You could use a DocTransformer, here's a place to start: https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents There may be more elegant ways, but that one is easy. Best, Erick On Thu, Mar 26, 2015 at 8:39 PM, fjq fquint...@gmail.com wrote: Is it possible to retrieve the server datetime? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-server-datetime-tp4195728.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR Index in shared/Network folder
To pile on: If you're talking about pointing two Solr instances at the _same_ index, it doesn't matter whether you are on NFS or not, you'll have all sorts of problems. And if this is a SolrCloud installation, it's particularly hard to get right. Please do not do this unless you have a very good reason, and please tell us what the reason is so we can perhaps suggest alternatives. Best, Erick On Fri, Mar 27, 2015 at 8:08 AM, Walter Underwood wun...@wunderwood.org wrote: Several years ago, I accidentally put Solr indexes on an NFS volume and it was 100X slower. If you have enough RAM, query speed should be OK, but startup time (loading indexes into file buffers) could be really long. Indexing could be quite slow. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Mar 26, 2015, at 11:31 PM, Shawn Heisey apa...@elyograg.org wrote: On 3/27/2015 12:06 AM, abhi Abhishek wrote: Greetings, I am trying to use a network shared location as my index directory. are there any known problems in using a Network File System for running a SOLR Instance? It is not recommended. You will probably need to change the lockType, ... the default native probably will not work, and you might need to change it to none to get it working ... but that disables an important safety mechanism that prevents index corruption. http://stackoverflow.com/questions/9599529/solr-over-nfs-problems Thanks, Shawn
Re: Retrieving list of words for highlighting
There's a JIRA ( https://issues.apache.org/jira/browse/SOLR-4722 ) describing a highlighter which returns term positions rather than snippets, which could then be mapped to the matching words in the indexed document (assuming that it's stored or that you have a copy elsewhere). -Simon On Wed, Mar 25, 2015 at 7:30 PM, Damien Dykman damien.dyk...@gmail.com wrote: In Solr 5 (or 4), is there an easy way to retrieve the list of words to highlight? Use case: allow an external application to highlight the matching words of a matching document, rather than using the highlighted snippets returned by Solr. Thanks, Damien
Re: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs)
You can simplify things a bit by indexing a batch number guaranteed to be different between two runs for the same keyField. In fact I'd make sure it was unique amongst all my runs. Simplest is a timestamp (assuming you don't start two batches within a millisecond!). So it looks like this. get a new timestamp Add it to _every_ doc in my current run. issue delete-by-query like 'q=keyfield:A AND timestamp:[* TO timestamp} commit As Shawn says, you have to very carefully control the commits. And also note that the curly brace at the end is NOT a typo, it excludes the endpoint. Best, Erick On Fri, Mar 27, 2015 at 7:01 AM, Russell Taylor russell.tay...@interactivedata.com wrote: Yes that works and now I have a better understanding of the soft and hard commits to boot. Thanks again Shawn. Russ. -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: 27 March 2015 13:22 To: solr-user@lucene.apache.org Subject: Re: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs) On 3/27/2015 7:07 AM, Russell Taylor wrote: Hi Shawn, thanks for the quick reply. I've looked at both methods and I think that they won't work for a number of reasons: 1) uniqueKey: I could use the uniqueKey and overwrite the original document but I need to remove the documents which are not on my new input list and the issue with the uniqueKey method is I don't know what to delete. Documents on the index: docs: [ { id:1 keyField:A },{ id:2 keyField:A },{ id:3 keyField:B } ] New Documents to go on index docs: [ { id:1 keyField:A },{ id:3 keyField:B } ] I would never know that id:2 should be deleted. (on some new document lists the delete list could be in the millions). 2) openSearcher: My openSearcher is set to false and I've also commented out autoSoftCommit so I don't get a partial list being returned on a query. !-- autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:1000}/maxTime /autoSoftCommit -- So is there another way to keep the original set of documents until the new set has been added to the index? If you are 100% in control of when commits with openSearcher=true are sent, which it sounds like you probably are, then you can do anything you want from the start of indexing until commit time, and the user will never see any of it, until the commit happens. That allows the following relatively simple paradigm: 1) Delete LOTS of stuff, or perhaps everything in the index with a deleteByQuery of *:* (for all documents). 2) Index everything you need to index. 3) Commit. Thanks, Shawn *** This message (including any files transmitted with it) may contain confidential and/or proprietary information, is the property of Interactive Data Corporation and/or its subsidiaries, and is directed only to the addressee(s). If you are not the designated recipient or have reason to believe you received this message in error, please delete this message from your system and notify the sender immediately. An unintended recipient's disclosure, copying, distribution, or use of this message or any attachments is prohibited and may be unlawful. ***
Re: Unable to perform search query after changing uniqueKey
You say you re-indexed, did you _completely_ remove the data directory first, i.e. the parent of the index and, maybe, tlog directories? I've occasionally seen remnants of old definitions pollute the new one, and since the uniqueKey key is so fundamental I can see it being a problem. Best, Erick On Fri, Mar 27, 2015 at 1:42 AM, Andrea Gazzarini a.gazzar...@gmail.com wrote: Hi Edwin, please provide some other detail about your context, (e.g. complete stacktrace, query you're issuing) Best, Andrea On 03/27/2015 09:38 AM, Zheng Lin Edwin Yeo wrote: Hi everyone, I've changed my uniqueKey to another name, instead of using id, on the schema.xml. However, after I have done the indexing (the indexing is successful), I'm not able to perform a search query on it. I gives the error java.lang.NullPointerException. Is there other place which I need to configure, besides changing the uniqueKey field in scheam.xml? Regards, Edwin
SOLR terms component and finding least frequent terms
Dear SOLR users, I have been using the /terms component to find low occurrence terms in a large SOLR index, and this works very well, but it is not possible to filter (fq) the results so you are stuck analyzing the whole index. Other options might be to use SOLR faceting, but I don't see how to easily produce least common facets. Does anyone have experience finding infrequent terms through the TermsComponent or via faceting? Sorry if this is an odd request, but being able to perform this sort of analysis would be very useful. Paul
Re: solr server datetime
Erick, Thank you very much, the ms(NOW) was all I needed. Best, Fabricio Em sex, 27 de mar de 2015 às 15:26, Erick Erickson [via Lucene] ml-node+s472066n4195883...@n3.nabble.com escreveu: Why do you want to in the first place? I ask because it's a common trap to think the server time is something that is useful... That said, it would require a little fiddling, but you can return the number of milliseconds since January 1, 1970 (standard Unix epoch) by adding ms(NOW) to your fl parameter. The general case here is that you can add the results of any function query to the fl list. You could use a DocTransformer, here's a place to start: https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents There may be more elegant ways, but that one is easy. Best, Erick On Thu, Mar 26, 2015 at 8:39 PM, fjq [hidden email] http:///user/SendEmail.jtp?type=nodenode=4195883i=0 wrote: Is it possible to retrieve the server datetime? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-server-datetime-tp4195728.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/solr-server-datetime-tp4195728p4195883.html To unsubscribe from solr server datetime, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4195728code=ZnF1aW50ZWxhQGdtYWlsLmNvbXw0MTk1NzI4fDk0MzAwNzkw . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/solr-server-datetime-tp4195728p4195923.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 5.0.0 and HDFS
I just started up a two shard cluster on two machines using HDFS. When I started to index documents, the log shows errors like this. They repeat when I execute searches. All seems well - searches and indexing appear to be working. Possibly a configuration issue? My HDFS config: directoryFactory name=DirectoryFactory class=solr.HdfsDirectoryFactory bool name=solr.hdfs.blockcache.enabledtrue/bool int name=solr.hdfs.blockcache.slab.count160/int bool name=solr.hdfs.blockcache.direct.memory.allocationtrue/bool int name=solr.hdfs.blockcache.blocksperbank16384/int bool name=solr.hdfs.blockcache.read.enabledtrue/bool bool name=solr.hdfs.blockcache.write.enabledfalse/bool bool name=solr.hdfs.nrtcachingdirectory.enabletrue/bool int name=solr.hdfs.nrtcachingdirectory.maxmergesizemb64/int int name=solr.hdfs.nrtcachingdirectory.maxcachedmb512/int str name=solr.hdfs.homehdfs://nameservice1:8020/solr5/str str name=solr.hdfs.confdir/etc/hadoop/conf.cloudera.hdfs1/str /directoryFactory Thank you! -Joe java.lang.IllegalStateException: file: BlockDirectory(HdfsDirectory@799d5a0e lockFactory=org.apache.solr.store.hdfs.HdfsLockFactory@49838b82) appears both in delegate and in cache: cache=[_25.fnm, _2d.si, _2e.nvd, _2b.si, _28.tvx, _2c.tvx, _1t.si, _27.nvd, _2b.tvd, _2d_Lucene50_0.pos, _23.nvd, _28_Lucene50_0.doc, _28_Lucene50_0.dvd, _2d.fdt, _2c_Lucene50_0.pos, _23.fdx, _2b_Lucene50_0.doc, _2d.nvm, _28.nvd, _23.fnm, _2b_Lucene50_0.tim, _2e.fdt, _2d_Lucene50_0.doc, _2b_Lucene50_0.dvd, _2d_Lucene50_0.dvd, _2b.nvd, _2g.tvx, _28_Lucene50_0.dvm, _1v_Lucene50_0.tip, _2e_Lucene50_0.dvm, _2e_Lucene50_0.pos, _2g.fdx, _2e.nvm, _2f.fdx, _1s.tvd, _23.nvm, _27.nvm, _1s_Lucene50_0.tip, _2c.fnm, _2b.fdt, _2d.fdx, _2c.fdx, _2c.nvm, _2e.fnm, _2d_Lucene50_0.dvm, _28.nvm, _28.fnm, _2b_Lucene50_0.tip, _2e_Lucene50_0.dvd, _2c.si, _2f.fdt, _2b.fnm, _2e_Lucene50_0.tip, _28.si, _28_Lucene50_0.tip, _2f.tvd, _2d_Lucene50_0.tim, _2f.tvx, _2b_Lucene50_0.pos, _2e.fdx, _28.fdx, _2c_Lucene50_0.dvd, _2g.tvd, _2c_Lucene50_0.tim, _2b.nvm, _23.fdt, _1s_Lucene50_0.tim, _28_Lucene50_0.tim, _2c_Lucene50_0.doc, _28.tvd, _2b.tvx, _2c.nvd, _2b.fdx, _2c_Lucene50_0.tip, _2e_Lucene50_0.doc, _2e_Lucene50_0.tim, _2c.fdt, _27.tvd, _2d.tvd, _2d.tvx, _28_Lucene50_0.pos, _2b_Lucene50_0.dvm, _2e.si, _2e.tvd, _2d.fnm, _2c.tvd, _2g.fdt, _2e.tvx, _28.fdt, _2d_Lucene50_0.tip, _2c_Lucene50_0.dvm, _2d.nvd],delegate=[_10.fdt, _10.fdx, _10.fnm, _10.nvd, _10.nvm, _10.si, _10.tvd, _10.tvx, _10_Lucene50_0.doc, _10_Lucene50_0.dvd, _10_Lucene50_0.dvm, _10_Lucene50_0.pos, _10_Lucene50_0.tim, _10_Lucene50_0.tip, _11.fdt, _11.fdx, _11.fnm, _11.nvd, _11.nvm, _11.si, _11.tvd, _11.tvx, _11_Lucene50_0.doc, _11_Lucene50_0.dvd, _11_Lucene50_0.dvm, _11_Lucene50_0.pos, _11_Lucene50_0.tim, _11_Lucene50_0.tip, _12.fdt, _12.fdx, _12.fnm, _12.nvd, _12.nvm, _12.si, _12.tvd, _12.tvx, _12_Lucene50_0.doc, _12_Lucene50_0.dvd, _12_Lucene50_0.dvm, _12_Lucene50_0.pos, _12_Lucene50_0.tim, _12_Lucene50_0.tip, _13.fdt, _13.fdx, _13.fnm, _13.nvd, _13.nvm, _13.si, _13.tvd, _13.tvx, _13_Lucene50_0.doc, _13_Lucene50_0.dvd, _13_Lucene50_0.dvm, _13_Lucene50_0.pos, _13_Lucene50_0.tim, _13_Lucene50_0.tip, _14.fdt, _14.fdx, _14.fnm, _14.nvd, _14.nvm, _14.si, _14.tvd, _14.tvx, _14_Lucene50_0.doc, _14_Lucene50_0.dvd, _14_Lucene50_0.dvm, _14_Lucene50_0.pos, _14_Lucene50_0.tim, _14_Lucene50_0.tip, _15.fdt, _15.fdx, _15.fnm, _15.nvd, _15.nvm, _15.si, _15.tvd, _15.tvx, _15_Lucene50_0.doc, _15_Lucene50_0.dvd, _15_Lucene50_0.dvm, _15_Lucene50_0.pos, _15_Lucene50_0.tim, _15_Lucene50_0.tip, _1f.fdt, _1f.fdx, _1f.fnm, _1f.nvd, _1f.nvm, _1f.si, _1f.tvd, _1f.tvx, _1f_Lucene50_0.doc, _1f_Lucene50_0.dvd, _1f_Lucene50_0.dvm, _1f_Lucene50_0.pos, _1f_Lucene50_0.tim, _1f_Lucene50_0.tip, _1g.fdt, _1g.fdx, _1g.fnm, _1g.nvd, _1g.nvm, _1g.si, _1g.tvd, _1g.tvx, _1g_Lucene50_0.doc, _1g_Lucene50_0.dvd, _1g_Lucene50_0.dvm, _1g_Lucene50_0.pos, _1g_Lucene50_0.tim, _1g_Lucene50_0.tip, _1h.fdt, _1h.fdx, _1h.fnm, _1h.nvd, _1h.nvm, _1h.si, _1h.tvd, _1h.tvx, _1h_Lucene50_0.doc, _1h_Lucene50_0.dvd, _1h_Lucene50_0.dvm, _1h_Lucene50_0.pos, _1h_Lucene50_0.tim, _1h_Lucene50_0.tip, _1i.fdt, _1i.fdx, _1i.fnm, _1i.nvd, _1i.nvm, _1i.si, _1i.tvd, _1i.tvx, _1i_Lucene50_0.doc, _1i_Lucene50_0.dvd, _1i_Lucene50_0.dvm, _1i_Lucene50_0.pos, _1i_Lucene50_0.tim, _1i_Lucene50_0.tip, _1j.fdt, _1j.fdx, _1j.fnm, _1j.nvd, _1j.nvm, _1j.si, _1j.tvd, _1j.tvx, _1j_Lucene50_0.doc, _1j_Lucene50_0.dvd, _1j_Lucene50_0.dvm, _1j_Lucene50_0.pos, _1j_Lucene50_0.tim, _1j_Lucene50_0.tip, _1k.fdt, _1k.fdx, _1k.fnm, _1k.nvd, _1k.nvm, _1k.si, _1k.tvd, _1k.tvx, _1k_Lucene50_0.doc, _1k_Lucene50_0.dvd, _1k_Lucene50_0.dvm, _1k_Lucene50_0.pos, _1k_Lucene50_0.tim, _1k_Lucene50_0.tip, _1l.fdt, _1l.fdx, _1l.fnm, _1l.nvd, _1l.nvm, _1l.si, _1l.tvd, _1l.tvx, _1l_Lucene50_0.doc,
New To Solr, getting error using the quick start guide
Hi I am new to solr and trying to run through the quick start guide ( http://lucene.apache.org/solr/quickstart.html). The installation seems fine but then I run: bin/solr start -e cloud -noprompt I get: Welcome to the SolrCloud example! Starting up 2 Solr nodes for your example SolrCloud cluster. Starting up SolrCloud node1 on port 8983 using command: solr start -cloud -s example/cloud/node1/solr -p 8983 Waiting to see Solr listening on port 8983 [|] Started Solr server on port 8983 (pid=15536). Happy searching! Starting node2 on port 7574 using command: solr start -cloud -s example/cloud/node2/solr -p 7574 -z localhost:9983 Waiting to see Solr listening on port 7574 [/] Started Solr server on port 7574 (pid=15798). Happy searching! Then I run in another console, because this one is still occupied with solr: bin/post -c gettingstarted docs/ I get: java -classpath /usr/lib/solr-5.0.0/dist/solr-core-5.0.0.jar -Dauto=yes -Dc=gettingstarted -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool docs/ SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/gettingstarted/update... Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log Entering recursive mode, max depth=999, delay=0s Indexing directory docs (3 files, depth=0) POSTing file quickstart.html (text/html) to [base]/extract SimplePostTool: FATAL: Connection error (is Solr running at http://localhost:8983/solr/gettingstarted/update ?): java.net.ConnectException: Connection timed out Mean while, in the console that I used to start solr I get: WARN - 2015-03-27 18:41:15.077; org.apache.solr.util.SolrCLI; Request to http://localhost:8983/solr/admin/info/system failed due to: Connection refused, sleeping for 5 seconds before re-trying the request ... Exception in thread main java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:117) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:178) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:610) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:445) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:214) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:160) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:136) at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:512) at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:456) at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:466) at org.apache.solr.util.SolrCLI.getZkHost(SolrCLI.java:1113) at org.apache.solr.util.SolrCLI$CreateCollectionTool.runTool(SolrCLI.java:1155) at org.apache.solr.util.SolrCLI.main(SolrCLI.java:203) SolrCloud example running, please visit http://localhost:8983/solr The console then exits from the process. I can open http://localhost:8983/solr/admin/info/system in my web browser and it has an xml file. http://localhost:8983/solr/gettingstarted/update in my web browser gives me: HTTP ERROR 404 Problem accessing /solr/gettingstarted/update. Reason: Not Found Powered by Jetty:// http://localhost:8983/solr/#/ shows data in my web browser, but the cloud tab is empty under graph. Any advice any one give me to get me started here with the product would be very appreciated. All the best. Will Ferrer