Re: Disable Caching
Thanks for the replies. 2012/10/17 Otis Gospodnetic otis.gospodne...@gmail.com Hi, If you are not searching against your master, and you shouldn't (and it sounds like you aren't), then you don't have to worry about disabling caches - they will just remain empty. You could comment them out, but I think that won't actually disable them. Warmup queries you can just comment our in solrconfig.xml. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Wed, Oct 17, 2012 at 12:25 PM, Anderson vasconcelos anderson.v...@gmail.com wrote: Hi I have a server that just index data and sincronize this data to others slaves. In my arquitecture, i have a one master server that only receive index requests and n slaves that receives only search requests. I wanna to disable the cache of the master server, because they not receive a search request, this is the best way? I can do this? Wat about warmingSearch, i must disable this too? I'm using solr 3.6.0 Thanks
Re: Question about wildcards
Hi. In debug mode, the generated query was: str name=rawquerystringfield:*2231-7/str str name=querystringfield:*2231-7/str str name=parsedqueryfield:*2231-7/str str name=parsedquery_toStringfield:*2231-7/str The analisys of indexing the text .2231-7 produces this result: Index Analyzer .22317 .22317 .22317 .22317 #1;1322. #1;7 .22317 And for search for *2231-7 , produces this result: Query Analyzer 22317 22317 22317 22317 22317 I don't understand why he don't find results when i use field:*2231-7. When i use field:*2231 without -7 the document was found. How Ahmet said, i think they using -7 to ignore the document. But in debug query, they don't show this. Any idea to solve this? Thanks 2012/5/18 Ahmet Arslan iori...@yahoo.com I have a field that was indexed with the string .2231-7. When i search using '*' or '?' like this *2231-7 the query don't returns results. When i remove -7 substring and search agin using *2231 the query returns. Finally when i search using .2231-7 the query returns too. May be standard tokenizer is splitting .2231-7 into multiple tokens? You can check that admin/analysis page. May be -7 is treated as negative clause? You can check that with debugQuery=on
Re: Question about wildcards
I change the fieldtype of field to the follow: fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzertokenizer class=solr.WhitespaceTokenizerFactory//analyzer /fieldType As you see, i just keep the WhitespaceTokenizerFactory. That's works. Now i could find using *2231?7, *2231*7, *2231-7, *2231*,.2231-7. How i can see, with this tokenizer the text was not spplitted. Is that the best way to solve this? Thanks 2012/5/21 Anderson vasconcelos anderson.v...@gmail.com Hi. In debug mode, the generated query was: str name=rawquerystringfield:*2231-7/str str name=querystringfield:*2231-7/str str name=parsedqueryfield:*2231-7/str str name=parsedquery_toStringfield:*2231-7/str The analisys of indexing the text .2231-7 produces this result: Index Analyzer .22317 .22317 .22317 .22317 #1;1322.#1;7 .22317 And for search for *2231-7 , produces this result: Query Analyzer 22317 22317 22317 22317 22317 I don't understand why he don't find results when i use field:*2231-7. When i use field:*2231 without -7 the document was found. How Ahmet said, i think they using -7 to ignore the document. But in debug query, they don't show this. Any idea to solve this? Thanks 2012/5/18 Ahmet Arslan iori...@yahoo.com I have a field that was indexed with the string .2231-7. When i search using '*' or '?' like this *2231-7 the query don't returns results. When i remove -7 substring and search agin using *2231 the query returns. Finally when i search using .2231-7 the query returns too. May be standard tokenizer is splitting .2231-7 into multiple tokens? You can check that admin/analysis page. May be -7 is treated as negative clause? You can check that with debugQuery=on
Re: Question about wildcards
Thanks all for the explanations. Anderson 2012/5/21 Jack Krupansky j...@basetechnology.com And, generally when I see a field that has values like .2231-7, it should be a string field rather than tokenized text. As a string, you can then do straight wildcards without surprises. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Monday, May 21, 2012 11:23 AM To: solr-user@lucene.apache.org Subject: Re: Question about wildcards Before Solr 3.6, which added MultiTermAwareComponent for analyzers, the presence of a wildcard completely short-circuited (prevented) the query-time analysis, so you have to manually emulate all steps of the query analyzer yourself if you want to do a wildcard. Even with 3.6, not all filters are multi-term aware. See: http://wiki.apache.org/solr/**MultitermQueryAnalysishttp://wiki.apache.org/solr/MultitermQueryAnalysis Do a query for .2231-7 and that will tell you which analyzer steps you will have to do manually. -- Jack Krupansky -Original Message- From: Anderson vasconcelos Sent: Monday, May 21, 2012 11:03 AM To: solr-user@lucene.apache.org Subject: Re: Question about wildcards Hi. In debug mode, the generated query was: str name=rawquerystringfield:***2231-7/str str name=querystringfield:***2231-7/str str name=parsedqueryfield:***2231-7/str str name=parsedquery_toString**field:*2231-7/str The analisys of indexing the text .2231-7 produces this result: Index Analyzer .22317 .22317 .22317 .22317 #1;1322. #1;7 .22317 And for search for *2231-7 , produces this result: Query Analyzer 22317 22317 22317 22317 22317 I don't understand why he don't find results when i use field:*2231-7. When i use field:*2231 without -7 the document was found. How Ahmet said, i think they using -7 to ignore the document. But in debug query, they don't show this. Any idea to solve this? Thanks 2012/5/18 Ahmet Arslan iori...@yahoo.com I have a field that was indexed with the string .2231-7. When i search using '*' or '?' like this *2231-7 the query don't returns results. When i remove -7 substring and search agin using *2231 the query returns. Finally when i search using .2231-7 the query returns too. May be standard tokenizer is splitting .2231-7 into multiple tokens? You can check that admin/analysis page. May be -7 is treated as negative clause? You can check that with debugQuery=on
Re: Question about cache
Hi Kuli Is Just raising. Thanks for the explanation. Regards Anderson 2012/5/11 Shawn Heisey s...@elyograg.org On 5/11/2012 9:30 AM, Anderson vasconcelos wrote: HI Kuli The free -m command gives me total used free sharedbuffers cached Mem: 9991 9934 57 0 75 5759 -/+ buffers/cache: 4099 5892 Swap: 8189 3395 4793 You can see that has only 57m free and 5GB cached. In top command, the glassfish process used 79,7% of memory: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 4336 root 21 0 29.7g 7.8g 4.0g S 0.3 79.7 5349:14 java If i increase the memory of server for more 2GB, the SO will be use this additional 2GB in cache? I need to increse the memory size? Are you having a problem you need to track down, or are you just raising a concern because your memory usage is not what you expected? It is 100% normal for a Linux system to show only a few megabytes of memory free. To make things run faster, the OS caches disk data using memory that is not directly allocated to programs or the OS itself. If a program requests memory, the OS will allocate it immediately, it simply forgets the least used part of the cache. Windows does this too, but Microsoft decided that novice users would freak out if the task manager were to give users the true picture of memory usage, so they exclude disk cache when calculating free memory. It's not really a lie, just not the full true picture. A recent version of Solr (3.5, if I remember right) made a major change in the way that the index files are accessed. The way things are done now is almost always faster, but it makes the memory usage in the top command completely useless. The VIRT memory size includes all of your index files, plus all the memory that the java process is capable of allocating, plus a little that i can't quite account for. The RES size is also bigger than expected, and I'm not sure why. Based on the numbers above, I am guessing that your indexes take up 15-20GB of disk space. For best performance, you would want a machine with at least 24GB of RAM so that your entire index can fit into the OS disk cache. The 10GB you have (which leaves the 5.8 GB for disk cache as you have seen) may be good enough to cache the frequently accessed portions of your index, so your performance might be just fine. Thanks, Shawn
Re: Identify indexed terms of document
Thanks 2012/5/11 Michael Kuhlmann k...@solarier.de Am 10.05.2012 22:27, schrieb Ahmet Arslan: It's possible to see what terms are indexed for a field of document that stored=false? One way is to use http://wiki.apache.org/solr/**LukeRequestHandlerhttp://wiki.apache.org/solr/LukeRequestHandler Another approach is this: - Query for exactly this document, e.g. by using the unique field - Add this to your URL parameters: facet=truefacet.field=Your fieldfacet.mincount=1 -Kuli
Re: Question about cache
HI Kuli The free -m command gives me total used free sharedbuffers cached Mem: 9991 9934 57 0 75 5759 -/+ buffers/cache: 4099 5892 Swap: 8189 3395 4793 You can see that has only 57m free and 5GB cached. In top command, the glassfish process used 79,7% of memory: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 4336 root 21 0 29.7g 7.8g 4.0g S 0.3 79.7 5349:14 java If i increase the memory of server for more 2GB, the SO will be use this additional 2GB in cache? I need to increse the memory size? Thanks 2012/5/11 Michael Kuhlmann k...@solarier.de Am 11.05.2012 15:48, schrieb Anderson vasconcelos: Hi Analysing the solr server in glassfish with Jconsole, the Heap Memory Usage don't use more than 4 GB. But, when was executed the TOP comand, the free memory in Operating system is only 200 MB. The physical memory is only 10GB. Why machine used so much memory? The cache fields are included in Heap Memory usage? The other 5,8 GB is the caching of Operating System for recent open files? Exists some way to tunning this? Thanks If the OS is Linux or some other Unix variant, it keeps as much disk content in memory as possible. Whenever new memory is needed, it automatically gets freed. That won't need time, and there's no need to tune anything. Don't look at the free memory in top command, it's nearly useless. Have a look at how much memory your Glassfish process is consuming, and use the 'free' command (maybe together with the -m parameter for human readability) to find out more about your free memory. The -/+ buffers/cache line is relevant. Greetings, Kuli
Re: Question about Streaming Update Solr Server
Anyone could reply this questions? Thanks 2012/3/5 Anderson vasconcelos anderson.v...@gmail.com Hi I have some questions about StreamingUpdateSolrServer. 1)What's queue size parameter? It's the number of documents in each thread? 2)When i configurated like this StreamingUpdateSolrServer(URL, 1000, 5) indexing runs ok. But when i up the number of threads like this new StreamingUpdateSolrServer(URL, 1000, 15) i received a java.net.SocketException: Broken pipe. Why? 3)When i indexing using addBean method, they open the max of threads than i configured. But when i use addBeans, they open only one thread. Is this correct? Thanks
Re: Permissions and user to acess administrative interface
Thanks for the responses. I will create rules via htaccess. Regards Vasconcelos 2012/2/13 Ge, Yao (Y.) y...@ford.com I can only speak from my experience with Tomcat. First make sure the available authentication modes are available by checking server.xml. I added a few roles in tomcat-users.xml and add individual user id/password to these roles. For example you can separate by Search, Update, Admin roles. Modified the web.xml to map different modules to different roles. -Yao -Original Message- From: Em [mailto:mailformailingli...@yahoo.de] Sent: Monday, February 13, 2012 11:05 AM To: solr-user@lucene.apache.org Subject: Re: Permissions and user to acess administrative interface Hi Anderson, you will need to rearrange the JSPs a little bit to do what you want. If you do so, you can create rules via .htaccess. Otherwise I would suggest you to look for a commercial distribution of Solr which might fit your needs. Regards, Em Am 13.02.2012 16:48, schrieb Anderson vasconcelos: Hi All Is there some way to add users and permissions on SOLR administration page? I need to restrict the access of users in the administration page. I Just wanna expose the query section for determinate user. Addition, i wanna to restrict the access of the cores per user. Somethings like that: Core 1 - Users : John, Paul, Carter Full Interface: John, Paul Only search interface: Carter Core 2 -Users: John , Mary Full Interface: John Only search interface: Mary Is that possible? Thanks
Re: Using UUID for uniqueId
Thanks 2012/2/8 François Schiettecatte fschietteca...@gmail.com Anderson I would say that this is highly unlikely, but you would need to pay attention to how they are generated, this would be a good place to start: http://en.wikipedia.org/wiki/Universally_unique_identifier Cheers François On Feb 8, 2012, at 1:31 PM, Anderson vasconcelos wrote: HI all If i use the UUID like a uniqueId in the future if i break my index in shards, i will have problems? The UUID generation could generate the same UUID in differents machines? Thanks
Re: Multiple Data Directories and 1 SOLR instance
Nitin, Use Multicore configuration. For each organization, you create a new core with especific configurations. You will have one SOLR instance and one SOLR Admin tool to manage all cores. The configuration is simple. Good Luck Regards Anderson 2012/1/26 David Radunz da...@boxen.net Hey, Sounds like what you need to setup is Multiple Cores configuration. At first I confused this with Multi Core CPU, but that's not what it's about. Basically it's a way to run multiple 'solr' cores/indexes/configurations from a single Solr instance (which will scale better as the resources will be shared). Have a read anyway: http://wiki.apache.org/solr/**CoreAdminhttp://wiki.apache.org/solr/CoreAdmin Cheers, David On 27/01/2012 8:18 AM, Nitin Arora wrote: Hi, We are using SOLR/Lucene to index/search the data about the user's of an organization. The nature of data is brief information about the user's work. Our data index requirement is to have segregated stores for each organization and currently we have 10 organizations and we have to run 10 different instances of SOLR to serve search results for an organization. As the new organizations are joining it is getting difficult to manage these many instances. I think now there is a need to use 1 SOLR instance and then have 10/multiple different data directories for each organization. When index/search request is received in SOLR we decide the data directory based on the organization. 1. Is it possible to do the same in SOLR and how can we achieve the same? 2. Will it be a good design to use SOLR like this? 3. Is there any impact on the scalability if we are able to manage the separate data directories inside SOLR? Thanks in advance Nitin -- View this message in context: http://lucene.472066.n3.** nabble.com/Multiple-Data-**Directories-and-1-SOLR-** instance-tp3691644p3691644.**htmlhttp://lucene.472066.n3.nabble.com/Multiple-Data-Directories-and-1-SOLR-instance-tp3691644p3691644.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr replication
Hi Parvin I did something that may help you. I set up apache (with mod_proxy and mode balance) like a front-end and use this to distruted the request of my aplication. Request for /update or /optmize, i'm redirect to master (or masters) server and requests /search i redirect to slaves. Example: Proxy balancer://solrclusterindex BalancerMember http://127.0.0.1:8080/apache-solr-1.4.1/ disablereuse=On route=jvm1 /Proxy Proxy balancer://solrclustersearch BalancerMember http://127.0.0.1:8080/apache-solr-1.4.1/ disablereuse=On route=jvm1 BalancerMember http://10.16.129.61:8080/apache-solr-1.4.1/ disablereuse=On route=jvm2 /Proxy ProxyPassMatch /solrcluster(.*)/update(.*)$ balancer://solrclusterindex$1/update$2 ProxyPassMatch /solrcluster(.*)/select(.*)$ balancer://solrclustersearch$1/select$2 I hope it helps you
Re: Indexing failover and replication
Thanks for the Reply Erick I will make the replication to both master manually. Thanks 2012/1/25, Erick Erickson erickerick...@gmail.com: No, there no good ways to have a single slave know about two masters and just use the right one. It sounds like you've got each machine being both a master and a slave? This is not supported. What you probably want to do is either set up a repeater or just index to the two masters and manually change the back to the primary if the primary goes down, having all replication happen from the master. Best Erick On Tue, Jan 24, 2012 at 11:36 AM, Anderson vasconcelos anderson.v...@gmail.com wrote: Hi I'm doing now a test with replication using solr 1.4.1. I configured two servers (server1 and server 2) as master/slave to sincronized both. I put apache on the front side, and we index sometime in server1 and sometime in server2. I realized that the both index servers are now confused. In solr data folder, was created many index folders with the timestamp of syncronization (Exemple: index.20120124041340) with some segments inside. I thought that was possible to index in two master server and than synchronized both using replication. It's really possible do this with replication mechanism? If is possible, what I have done wrong? I need to have more than one node for indexing to guarantee failover feature for indexing. MultiMaster is the best way to guarantee failover feature for indexing? Thanks
Re: Size of index to use shard
Apparently, not so easy to determine when to break the content into pieces. I'll investigate further about the amount of documents, the size of each document and what kind of search is being used. It seems, I will have to do a load test to identify the cutoff point to begin using the strategy of shards. Thanks 2012/1/24, Dmitry Kan dmitry@gmail.com: Hi, The article you gave mentions 13GB of index size. It is quite small index from our perspective. We have noticed, that at least solr 3.4 has some sort of choking point with respect to growing index size. It just becomes substantially slower than what we need (a query on avg taking more than 3-4 seconds) once index size crosses a magic level (about 80GB following our practical observations). We try to keep our indices at around 60-70GB for fast searches and above 100GB for slow ones. We also route majority of user queries to fast indices. Yes, caching may help, but not necessarily we can afford adding more RAM for bigger indices. BTW, our documents are very small, thus in 100GB index we can have around 200 mil. documents. It would be interesting to see, how you manage to ensure q-times under 1 sec with an index of 250GB? How many documents / facets do you ask max. at a time? FYI, we ask for a thousand of facets in one go. Regards, Dmitry On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: Hi, it depends from your hardware. Read this: http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/ Think about your cache-config (few updates, big caches) and a good HW-infrastructure. In my case i can handle a 250GB index with 100mil. docs on a I7 machine with RAID10 and 24GB RAM = q-times under 1 sec. Regards Vadim 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com: Hi Has some size of index (or number of docs) that is necessary to break the index in shards? I have a index with 100GB of size. This index increase 10GB per year. (I don't have information how many docs they have) and the docs never will be deleted. Thinking in 30 years, the index will be with 400GB of size. I think is not required to break in shard, because i not consider this like a large index. Am I correct? What's is a real large index Thanks
Re: Size of index to use shard
Thanks for the explanation Erick :) 2012/1/24, Erick Erickson erickerick...@gmail.com: Talking about index size can be very misleading. Take a look at http://lucene.apache.org/java/3_5_0/fileformats.html#file-names. Note that the *.fdt and *.fdx files are used to for stored fields, i.e. the verbatim copy of data put in the index when you specify stored=true. These files have virtually no impact on search speed. So, if your *.fdx and *.fdt files are 90G out of a 100G index it is a much different thing than if these files are 10G out of a 100G index. And this doesn't even mention the peculiarities of your query mix. Nor does it say a thing about whether your cheapest alternative is to add more memory. Anderson's method is about the only reliable one, you just have to test with your index and real queries. At some point, you'll find your tipping point, typically when you come under memory pressure. And it's a balancing act between how much memory you allocate to the JVM and how much you leave for the op system. Bottom line: No hard and fast numbers. And you should periodically re-test the empirical numbers you *do* arrive at... Best Erick On Tue, Jan 24, 2012 at 5:31 AM, Anderson vasconcelos anderson.v...@gmail.com wrote: Apparently, not so easy to determine when to break the content into pieces. I'll investigate further about the amount of documents, the size of each document and what kind of search is being used. It seems, I will have to do a load test to identify the cutoff point to begin using the strategy of shards. Thanks 2012/1/24, Dmitry Kan dmitry@gmail.com: Hi, The article you gave mentions 13GB of index size. It is quite small index from our perspective. We have noticed, that at least solr 3.4 has some sort of choking point with respect to growing index size. It just becomes substantially slower than what we need (a query on avg taking more than 3-4 seconds) once index size crosses a magic level (about 80GB following our practical observations). We try to keep our indices at around 60-70GB for fast searches and above 100GB for slow ones. We also route majority of user queries to fast indices. Yes, caching may help, but not necessarily we can afford adding more RAM for bigger indices. BTW, our documents are very small, thus in 100GB index we can have around 200 mil. documents. It would be interesting to see, how you manage to ensure q-times under 1 sec with an index of 250GB? How many documents / facets do you ask max. at a time? FYI, we ask for a thousand of facets in one go. Regards, Dmitry On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: Hi, it depends from your hardware. Read this: http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/ Think about your cache-config (few updates, big caches) and a good HW-infrastructure. In my case i can handle a 250GB index with 100mil. docs on a I7 machine with RAID10 and 24GB RAM = q-times under 1 sec. Regards Vadim 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com: Hi Has some size of index (or number of docs) that is necessary to break the index in shards? I have a index with 100GB of size. This index increase 10GB per year. (I don't have information how many docs they have) and the docs never will be deleted. Thinking in 30 years, the index will be with 400GB of size. I think is not required to break in shard, because i not consider this like a large index. Am I correct? What's is a real large index Thanks
Indexing failover and replication
Hi I'm doing now a test with replication using solr 1.4.1. I configured two servers (server1 and server 2) as master/slave to sincronized both. I put apache on the front side, and we index sometime in server1 and sometime in server2. I realized that the both index servers are now confused. In solr data folder, was created many index folders with the timestamp of syncronization (Exemple: index.20120124041340) with some segments inside. I thought that was possible to index in two master server and than synchronized both using replication. It's really possible do this with replication mechanism? If is possible, what I have done wrong? I need to have more than one node for indexing to guarantee failover feature for indexing. MultiMaster is the best way to guarantee failover feature for indexing? Thanks
Size of index to use shard
Hi Has some size of index (or number of docs) that is necessary to break the index in shards? I have a index with 100GB of size. This index increase 10GB per year. (I don't have information how many docs they have) and the docs never will be deleted. Thinking in 30 years, the index will be with 400GB of size. I think is not required to break in shard, because i not consider this like a large index. Am I correct? What's is a real large index Thanks
Re: Phonetic search for portuguese
Anyone could help? Thanks 2012/1/20, Anderson vasconcelos anderson.v...@gmail.com: Hi The phonetic filters (DoubleMetaphone, Metaphone, Soundex, RefinedSoundex, Caverphone) is only for english language or works for other languages? Have some phonetic filter for portuguese? If dont have, how i can implement this? Thanks
Re: Phonetic search for portuguese
Hi Gora, thanks for the reply. I'm interesting in see how you did this solution. But , my time is not to long and i need to create some solution for my client early. If anyone knows some other simple and fast solution, please post on this thread. Gora, you could talk how you implemented the Custom Filter Factory and how used this on SOLR? Thanks 2012/1/22, Gora Mohanty g...@mimirtech.com: On Sun, Jan 22, 2012 at 5:47 PM, Anderson vasconcelos anderson.v...@gmail.com wrote: Anyone could help? Thanks 2012/1/20, Anderson vasconcelos anderson.v...@gmail.com: Hi The phonetic filters (DoubleMetaphone, Metaphone, Soundex, RefinedSoundex, Caverphone) is only for english language or works for other languages? Have some phonetic filter for portuguese? If dont have, how i can implement this? We did this, in another context, by using the open-source aspell library to handle the spell-checking for us. This has distinct advantages as aspell is well-tested, handles soundslike in a better manner at least IMHO, and supports a wide variety of languages, including Portugese. There are some drawbacks, as aspell only has C/C++ interfaces, and hence we built bindings on top of SWIG. Also, we handled the integration with Solr via a custom filter factory, though there are better ways to do this. Such a project would thus, have dependencies on aspell, and our custom code. If there is interest in this, we would be happy to open source this code: Given our current schedule this could take 2-3 weeks. Regards, Gora
Re: Phonetic search for portuguese
Thanks a lot Gora. I need to delivery the first release for my client on 25 january. With your explanation, i can negociate better the date to delivery of this feature for next month, because i have other business rules for delivery and this features is more complex than i thought. I could help you to shared this solution with solr community. Maybe we can create some component in google code, or something like that, wich any solr user can use. 2012/1/23, Gora Mohanty g...@mimirtech.com: On Mon, Jan 23, 2012 at 5:58 AM, Anderson vasconcelos anderson.v...@gmail.com wrote: Hi Gora, thanks for the reply. I'm interesting in see how you did this solution. But , my time is not to long and i need to create some solution for my client early. If anyone knows some other simple and fast solution, please post on this thread. What is your time line? I will see if we can expedite the open sourcing of this. Gora, you could talk how you implemented the Custom Filter Factory and how used this on SOLR? [...] That part is quite simple, though it is possible that I have not correctly addressed all issues for a custom FilterFactory. Please see: AspellFilterFactory: http://pastebin.com/jTBcfmd1 AspellFilter:http://pastebin.com/jDDKrPiK The latter loads a java_aspell library that is created by SWIG by setting up Java bindings on top of SWIG, and configuring it for the language of interest. Next, you will need a library that encapsulates various aspell functionality in Java. I am afraid that this is a little long: Suggest: http://pastebin.com/6NrGCVma Finally, you will have to set up the Solr schema to use this filter factory, e.g., one could create a new Solr TextField, where the solr.DoubleMetaphoneFilterFactory is replaced with com.mimirtech.search.solr.analysis.AspellFilterFactory We can discuss further how to set this up, but should probably take that discussion off-list. Regards, Gora
Re: HIbernate Search and SOLR Integration
Otis, The DataImportHandler is not only for import data from database? I don't wanna to import data from database. I just wanna to persist the object in my database and after send this saved object to SOLR. When the user find some document using the SOLR search, i need to return this persistent object (That was found in SOLR with the contents saved in database). It's possible do this with DataImporHandler? If not possible, has other solution or i have to make this merge in my aplication using in clause or temporary table? Thanks 2012/1/20 Otis Gospodnetic otis_gospodne...@yahoo.com Hi Anderson, Not sure if you saw http://wiki.apache.org/solr/DataImportHandler Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html - Original Message - From: Anderson vasconcelos anderson.v...@gmail.com To: solr-user solr-user@lucene.apache.org Cc: Sent: Thursday, January 19, 2012 10:08 PM Subject: HIbernate Search and SOLR Integration Hi. It's possible to integrate Hibernate Search with SOLR? I wanna use Hibernate Search in my entities and use SOLR to make the work of index and search. Hibernate Search call SOLR to find in index and than find the respective objects in database. Is that possible? Exists some configuration for this? If it's not possible, whats the best strategy to unify the search on index with search in database using SOLR? Manually join of results from index in database query using temporary table or in clause? Thanks
Re: HIbernate Search and SOLR Integration
Ok. I thought there was an easier way to do this using hibernate search. I will make this manually. Thanks for help 2012/1/20 Otis Gospodnetic otis_gospodne...@yahoo.com Hi, If you save all fields you want to display in search results, then you don't need to go to the database at search time. If you do not save all fields you want to display in search results, then you will need to first query Solr, get IDs of all matches you want to display, and then from your application do a SELECT with those IDs. DataImportHandler is for indexing data from DB and is not used at search-time. HTH Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html - Original Message - From: Anderson vasconcelos anderson.v...@gmail.com To: solr-user@lucene.apache.org; Otis Gospodnetic otis_gospodne...@yahoo.com Cc: Sent: Friday, January 20, 2012 8:33 AM Subject: Re: HIbernate Search and SOLR Integration Otis, The DataImportHandler is not only for import data from database? I don't wanna to import data from database. I just wanna to persist the object in my database and after send this saved object to SOLR. When the user find some document using the SOLR search, i need to return this persistent object (That was found in SOLR with the contents saved in database). It's possible do this with DataImporHandler? If not possible, has other solution or i have to make this merge in my aplication using in clause or temporary table? Thanks 2012/1/20 Otis Gospodnetic otis_gospodne...@yahoo.com Hi Anderson, Not sure if you saw http://wiki.apache.org/solr/DataImportHandler Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html - Original Message - From: Anderson vasconcelos anderson.v...@gmail.com To: solr-user solr-user@lucene.apache.org Cc: Sent: Thursday, January 19, 2012 10:08 PM Subject: HIbernate Search and SOLR Integration Hi. It's possible to integrate Hibernate Search with SOLR? I wanna use Hibernate Search in my entities and use SOLR to make the work of index and search. Hibernate Search call SOLR to find in index and than find the respective objects in database. Is that possible? Exists some configuration for this? If it's not possible, whats the best strategy to unify the search on index with search in database using SOLR? Manually join of results from index in database query using temporary table or in clause? Thanks
Phonetic search for portuguese
Hi The phonetic filters (DoubleMetaphone, Metaphone, Soundex, RefinedSoundex, Caverphone) is only for english language or works for other languages? Have some phonetic filter for portuguese? If dont have, how i can implement this? Thanks
HIbernate Search and SOLR Integration
Hi. It's possible to integrate Hibernate Search with SOLR? I wanna use Hibernate Search in my entities and use SOLR to make the work of index and search. Hibernate Search call SOLR to find in index and than find the respective objects in database. Is that possible? Exists some configuration for this? If it's not possible, whats the best strategy to unify the search on index with search in database using SOLR? Manually join of results from index in database query using temporary table or in clause? Thanks
Re: Migrate Lucene 2.9 To SOLR
OK. Thanks for help. I gonna try do migrate 2011/12/14 Chris Hostetter hossman_luc...@fucit.org : I have a old project that use Lucene 2.9. Its possible to use the index : created by lucene in SOLR? May i just copy de index to data directory of : SOLR, or exists some mechanism to import Lucene index? you can use an index created directly with lucene libraries in Solr, but in order for Solr to understand that index and do anything meaningful with it you have to configure solr with a schema.xml file that makes sense given the custom code used to build that index (ie: what fields did you store, what fields did you index, what analyzers did you use, what fields dod you index with term vectors, etc...) -Hoss
Migrate Lucene 2.9 To SOLR
Hi I have a old project that use Lucene 2.9. Its possible to use the index created by lucene in SOLR? May i just copy de index to data directory of SOLR, or exists some mechanism to import Lucene index? Thanks
Export Index Data.
Hi Is possible to export one set of documents indexed in one solr server for do a sincronization with other solr server? Thank's
Too Many Open Files
Hi all When i send a delete query to SOLR, using the SOLRJ i received this exception: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Too many open files 11:53:06,964 INFO [HttpMethodDirector] I/O exception (java.net.SocketException) caught when processing request: Too many open files Anyone could Help me? How i can solve this? Thanks
Re: Too Many Open Files
Thanks for responses. I instantiate one instance of per request (per delete query, in my case). I have a lot of concurrency process. Reusing the same instance (to send, delete and remove data) in solr, i will have a trouble? My concern is if i do this, solr will commit documents with data from other transaction. Thanks 2010/6/28 Michel Bottan freakco...@gmail.com Hi Anderson, If you are using SolrJ, it's recommended to reuse the same instance per solr server. http://wiki.apache.org/solr/Solrj#CommonsHttpSolrServer But there are other scenarios which may cause this situation: 1. Other application running in the same Solr JVM which doesn't close properly sockets or control file handlers. 2. Open files limits configuration is low . Check your limits, read it from JVM process info: cat /proc/1234/limits (where 1234 is your process ID) Cheers, Michel Bottan On Mon, Jun 28, 2010 at 1:18 PM, Erick Erickson erickerick...@gmail.com wrote: This probably means you're opening new readers without closing old ones. But that's just a guess. I'm guessing that this really has nothing to do with the delete itself, but the delete is what's finally pushing you over the limit. I know this has been discussed before, try searching the mail archive for TooManyOpenFiles and/or File Handles You could get much better information by providing more details, see: http://wiki.apache.org/solr/UsingMailingLists?highlight=(most)|(users)|(list)http://wiki.apache.org/solr/UsingMailingLists?highlight=%28most%29%7C%28users%29%7C%28list%29 http://wiki.apache.org/solr/UsingMailingLists?highlight=%28most%29%7C%28users%29%7C%28list%29 Best Erick On Mon, Jun 28, 2010 at 11:56 AM, Anderson vasconcelos anderson.v...@gmail.com wrote: Hi all When i send a delete query to SOLR, using the SOLRJ i received this exception: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Too many open files 11:53:06,964 INFO [HttpMethodDirector] I/O exception (java.net.SocketException) caught when processing request: Too many open files Anyone could Help me? How i can solve this? Thanks
Re: Too Many Open Files
Other question, Why SOLRJ d'ont close the StringWriter e OutputStreamWriter ? thanks 2010/6/28 Anderson vasconcelos anderson.v...@gmail.com Thanks for responses. I instantiate one instance of per request (per delete query, in my case). I have a lot of concurrency process. Reusing the same instance (to send, delete and remove data) in solr, i will have a trouble? My concern is if i do this, solr will commit documents with data from other transaction. Thanks 2010/6/28 Michel Bottan freakco...@gmail.com Hi Anderson, If you are using SolrJ, it's recommended to reuse the same instance per solr server. http://wiki.apache.org/solr/Solrj#CommonsHttpSolrServer But there are other scenarios which may cause this situation: 1. Other application running in the same Solr JVM which doesn't close properly sockets or control file handlers. 2. Open files limits configuration is low . Check your limits, read it from JVM process info: cat /proc/1234/limits (where 1234 is your process ID) Cheers, Michel Bottan On Mon, Jun 28, 2010 at 1:18 PM, Erick Erickson erickerick...@gmail.com wrote: This probably means you're opening new readers without closing old ones. But that's just a guess. I'm guessing that this really has nothing to do with the delete itself, but the delete is what's finally pushing you over the limit. I know this has been discussed before, try searching the mail archive for TooManyOpenFiles and/or File Handles You could get much better information by providing more details, see: http://wiki.apache.org/solr/UsingMailingLists?highlight=(most)|(users)|(list)http://wiki.apache.org/solr/UsingMailingLists?highlight=%28most%29%7C%28users%29%7C%28list%29 http://wiki.apache.org/solr/UsingMailingLists?highlight=%28most%29%7C%28users%29%7C%28list%29 Best Erick On Mon, Jun 28, 2010 at 11:56 AM, Anderson vasconcelos anderson.v...@gmail.com wrote: Hi all When i send a delete query to SOLR, using the SOLRJ i received this exception: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Too many open files 11:53:06,964 INFO [HttpMethodDirector] I/O exception (java.net.SocketException) caught when processing request: Too many open files Anyone could Help me? How i can solve this? Thanks
Re: SolrUser - ERROR:SCHEMA-INDEX-MISMATCH
Thanks for the helps. The field is just for filter my data. They are: client_id, instance_id. When i index my data, i put de identifier of client (Because my application is a multiclient). When i search in solr, i wanna to find the docs where client_id:1, as example. Put the field as string, this works. When i see that i can put the field as long, i think that's could be a best practice. But my trouble is i have many docs indexeds. How to change to long now is a bad idea, i will mantain the field in string type. (Correct me if i am wrong) Thanks 2010/5/13 Erick Erickson erickerick...@gmail.com This is probably a bad idea. You're getting by on backwards compatibility stuff, I'd really recommend that you reindex your entire corpus, possibly getting by on what you already have until you can successfully reindex. Have a look at trie fields (this is detailed in the example schema.xml). Here's another place to look: http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ You also haven't told us what you want to do with the field, so making recommendations is difficult. Best Erick On Thu, May 13, 2010 at 5:19 PM, Anderson vasconcelos anderson.v...@gmail.com wrote: Hi Erick. I put in my schema.xml fields with type string. The system go to te production, and now i see that the field must be a long field. When i change the fieldtype to long, show the error ERROR:SCHEMA-INDEX-MISMATCH when i search by solr admin. I Put plong, and this works. This is the way that i must go on? (This could generate a trouble in the future?) What's the advantages to set the field type to long? I must mantain this field in string type? Thanks 2010/5/13 Erick Erickson erickerick...@gmail.com Not at present, you must re-index your documents when you redefine your schema to change existing documents. Field updating of documents already indexed is being worked on, but it's not available yet. Best Erick On Thu, May 13, 2010 at 3:58 PM, Anderson vasconcelos anderson.v...@gmail.com wrote: Hi All. I have the follow fields in my schema: field name=uuid_field type=uuid indexed=true stored=true default=NEW/ field name=entity_instance_id type=plong indexed=true stored=true required=true/ field name=child_instance_id type=plong indexed=true stored=true required=true/ field name=client_id type=plong indexed=true stored=true required=true/ field name=indexing_date type=date default=NOW multiValued=false indexed=true stored=true/ field name=field_name type=textgen indexed=true stored=true required=false/ field name=value type=textgen indexed=true stored=false required=false/ I need to change the index of SOLR, adding a dynamic field that will contains all values of value field. Its possible to get all index data and reindex, putting the values on my dynamic field? How the data was no stored, i don't find one way to do this Thanks
Connection Pool
Hi I wanna to know if has any connection pool client to manage the connections with solr. In my system, we have a lot of concurrency index request. I cant shared my connection, i need to create one per transaction. But if i create one per transaction, i think the performance will down. How you resolve this problem? Thanks
SolrUser - ERROR:SCHEMA-INDEX-MISMATCH
Hi All. I have the follow fields in my schema: field name=uuid_field type=uuid indexed=true stored=true default=NEW/ field name=entity_instance_id type=plong indexed=true stored=true required=true/ field name=child_instance_id type=plong indexed=true stored=true required=true/ field name=client_id type=plong indexed=true stored=true required=true/ field name=indexing_date type=date default=NOW multiValued=false indexed=true stored=true/ field name=field_name type=textgen indexed=true stored=true required=false/ field name=value type=textgen indexed=true stored=false required=false/ I need to change the index of SOLR, adding a dynamic field that will contains all values of value field. Its possible to get all index data and reindex, putting the values on my dynamic field? How the data was no stored, i don't find one way to do this Thanks
SolrUser - Reindex
Why solr/lucene no index the Character '@' ? I send to index email fields x...@gmail.com ...and after try do search to_email:*...@*, and not found. I need to do some configuration? Thanks
Re: SolrUser - ERROR:SCHEMA-INDEX-MISMATCH
Hi Erick. I put in my schema.xml fields with type string. The system go to te production, and now i see that the field must be a long field. When i change the fieldtype to long, show the error ERROR:SCHEMA-INDEX-MISMATCH when i search by solr admin. I Put plong, and this works. This is the way that i must go on? (This could generate a trouble in the future?) What's the advantages to set the field type to long? I must mantain this field in string type? Thanks 2010/5/13 Erick Erickson erickerick...@gmail.com Not at present, you must re-index your documents when you redefine your schema to change existing documents. Field updating of documents already indexed is being worked on, but it's not available yet. Best Erick On Thu, May 13, 2010 at 3:58 PM, Anderson vasconcelos anderson.v...@gmail.com wrote: Hi All. I have the follow fields in my schema: field name=uuid_field type=uuid indexed=true stored=true default=NEW/ field name=entity_instance_id type=plong indexed=true stored=true required=true/ field name=child_instance_id type=plong indexed=true stored=true required=true/ field name=client_id type=plong indexed=true stored=true required=true/ field name=indexing_date type=date default=NOW multiValued=false indexed=true stored=true/ field name=field_name type=textgen indexed=true stored=true required=false/ field name=value type=textgen indexed=true stored=false required=false/ I need to change the index of SOLR, adding a dynamic field that will contains all values of value field. Its possible to get all index data and reindex, putting the values on my dynamic field? How the data was no stored, i don't find one way to do this Thanks
Re: SolrUser - Reindex
I'm using the textgen fieldtype on my field as follow: fieldType name=textgen class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType . dynamicField name=field_value_* type=textgenindexed=true stored=true/ . They no remove the @ symbol. To configure to index the @ symbol i must use HTMLStripStandardTokenizerFactory ? Thanks 2010/5/13 Erick Erickson erickerick...@gmail.com Probably your analyzer is removing the @ symbol, it's hard to say if you don't include the relevant parts of your schema. This page might help: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersBest Erick On Thu, May 13, 2010 at 3:59 PM, Anderson vasconcelos anderson.v...@gmail.com wrote: Why solr/lucene no index the Character '@' ? I send to index email fields x...@gmail.com ...and after try do search to_email:*...@*, and not found. I need to do some configuration? Thanks