is group.query supported in solrcloud (4.8) ?
hello I have a collection 0_2014_10_11 made of three shards When I try a group.query, even specifying a single shard, i get this error shard 0 did not set sort field values (FieldDoc.fields is null); you must pass fillFields=true to IndexSearcher.search on each shard This is the request, ask the collection to find groups on IDCat3=922 http://src-dev-1:8080/solr/0_2014_10_11/select?q=*:*group=truegroup.query=IDCat3%3A922shards=src-dev-1:8080/solr/0_2014_10_11_shard1_replica2 According to this page [ https://cwiki.apache.org/confluence/display/solr/Result+Grouping ] the group.query is supported. Am I missing some key parameter? Should the shards parameter be really mandatory? It seems that with group.field it is not required. Thanks Giovanni
grouping finds result name=doclist numFound=0
Sorry for the basic question q=*:*fq=-sku:2471834fq=FiltroDispo:1fq=has_image:1rows=100fl=descCat3,IDCat3,ranking2group=truegroup.field=IDCat3group.sort=ranking2+descgroup.ngroups=true returns some groups with no results. I'm using solr 4.8.0, the collection has 3 shards Am I missing some parameters? lst name=grouped lst name=IDCat3 int name=matches297254/int int name=ngroups49/int arr name=groups lst int name=groupValue0/intresult name=doclist numFound=0 start=0//lst ... lstint name=groupValue12043/intresult name=doclist numFound=2 start=0docint name=IDCat312043/intstr name=descCat3SSD/strint name=ranking2498/int/doc/result/lst
Re: unstable results on refresh
My user interface shows some boxes to describe results categories. After half a day of small updates and delete I noticed with various queries that the boxes started swapping while browsing. For sure I relied too much in getting the same results on each call, now I'm keeping the categories order in request parameters to avoid the blink effect while browsing. The optimize process is really slow, and I can't use it. Since I have many other parameters that should be carried along the request to make sure that the navigation is consistent, I would like to understand if is there a setup that can limit the idf change and keep it low enough I tried with indexConfig mergeFactor5/mergeFactor /indexConfig In solrconfig but this morning /solr/admin/cores?action=STATUS still reports a number of segments above ten for all cores of the shard. (I'm sure I have reloaded each core after changing the value) Now I'm trying with expungeDeletes called from solrj, but still I don't see the segment count decrease UpdateRequest commitRequest = new UpdateRequest(); commitRequest.setAction //(action, waitFlush, waitSearcher, maxSegments, softCommit, expungeDeletes) ( ACTION.COMMIT, true, true, 10, false, true); commitRequest.process(solrServer); 2014-10-22 15:48 GMT+02:00 Erick Erickson erickerick...@gmail.com: I would rather ask whether such small differences matter enough to do this. Is this something users will _ever_ notice? Optimization is quite a heavyweight operation, and is generally not recommended on indexes that change often, and 5 minutes is certainly below the recommendation for optimizing. There is/has been work done on distributed IDF, but I don't quite know the current status that should address this (I think). But other than in a test setup, is it worth the effort? Best, Erick On Wed, Oct 22, 2014 at 3:54 AM, Giovanni Bricconi giovanni.bricc...@banzai.it wrote: I have made some small patch to the application to make this problem less visible, and I'm trying to perform the optimize once per hour, yesterday it took 5 minutes to perform it, this morning 15 minutes. Today I will collect some statistics but the publication process sends documents every 5 minutes, and I think the optimize is taking too much time. I have no default mergeFactor configured for this collection, do you think that setting it to a small value could improve the situation? If I have understood well having to merge segments will keep similar stats on all nodes. It's ok to have the indexing process a little bit slower. 2014-10-21 18:44 GMT+02:00 Erick Erickson erickerick...@gmail.com: Giovanni: To see how this happens, consider a shard with a leader and two followers. Assume your autocommit interval is 60 seconds on each. This interval can expire at slightly different wall clock times. Even if the servers started perfectly in synch, they can get slightly out of sync. So, you index a bunch of docs and these replicas close the current segment and re-open a new segment with slightly different contents. Now docs come in that replace older docs. The tf/idf statistics _include_ deleted document data (which is purged on optimize). Given that doc X an be in different segments (or, more accurately, segments that get merged at different times on different machines), replica 1 may have slightly different stats than replica 2, thus computing slightly different scores. Optimizing purges all data related to deleted documents, so it all regularizes itself on optimize. Best, Erick On Tue, Oct 21, 2014 at 11:08 AM, Giovanni Bricconi giovanni.bricc...@banzai.it wrote: I noticed again the problem, now I was able to collect some data. in my paste http://pastebin.com/nVwf327c you can see the result of the same query issued twice, the 2nd and 3rd group are swapped. I pasted also the clusterstate and the core state for each core. The logs did'n show any problem related to indexing, only some malformed query. After doing an optimize the problem disappeared. So, is the problem related to documents that where deleted from the index? The optimization took 5 minutes to complete 2014-10-21 11:41 GMT+02:00 Giovanni Bricconi giovanni.bricc...@banzai.it: Nice! I will monitor the index and try this if the problem comes back. Actually the problem was due to small differences in score, so I think the problem has the same origin 2014-10-21 8:10 GMT+02:00 lboutros boutr...@gmail.com: Hi Giovanni, we had this problem as well. The cause was that the different nodes have slightly different idf values. We solved this problem by doing an optimize operation which really remove suppressed data. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/unstable-results-on-refresh
Re: unstable results on refresh
I have made some small patch to the application to make this problem less visible, and I'm trying to perform the optimize once per hour, yesterday it took 5 minutes to perform it, this morning 15 minutes. Today I will collect some statistics but the publication process sends documents every 5 minutes, and I think the optimize is taking too much time. I have no default mergeFactor configured for this collection, do you think that setting it to a small value could improve the situation? If I have understood well having to merge segments will keep similar stats on all nodes. It's ok to have the indexing process a little bit slower. 2014-10-21 18:44 GMT+02:00 Erick Erickson erickerick...@gmail.com: Giovanni: To see how this happens, consider a shard with a leader and two followers. Assume your autocommit interval is 60 seconds on each. This interval can expire at slightly different wall clock times. Even if the servers started perfectly in synch, they can get slightly out of sync. So, you index a bunch of docs and these replicas close the current segment and re-open a new segment with slightly different contents. Now docs come in that replace older docs. The tf/idf statistics _include_ deleted document data (which is purged on optimize). Given that doc X an be in different segments (or, more accurately, segments that get merged at different times on different machines), replica 1 may have slightly different stats than replica 2, thus computing slightly different scores. Optimizing purges all data related to deleted documents, so it all regularizes itself on optimize. Best, Erick On Tue, Oct 21, 2014 at 11:08 AM, Giovanni Bricconi giovanni.bricc...@banzai.it wrote: I noticed again the problem, now I was able to collect some data. in my paste http://pastebin.com/nVwf327c you can see the result of the same query issued twice, the 2nd and 3rd group are swapped. I pasted also the clusterstate and the core state for each core. The logs did'n show any problem related to indexing, only some malformed query. After doing an optimize the problem disappeared. So, is the problem related to documents that where deleted from the index? The optimization took 5 minutes to complete 2014-10-21 11:41 GMT+02:00 Giovanni Bricconi giovanni.bricc...@banzai.it: Nice! I will monitor the index and try this if the problem comes back. Actually the problem was due to small differences in score, so I think the problem has the same origin 2014-10-21 8:10 GMT+02:00 lboutros boutr...@gmail.com: Hi Giovanni, we had this problem as well. The cause was that the different nodes have slightly different idf values. We solved this problem by doing an optimize operation which really remove suppressed data. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/unstable-results-on-refresh-tp4164913p4165086.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: unstable results on refresh
I noticed the problem looking at a group query, the groups returned where sorted on the score field of the first result, and then showed to the user. Repeating the same query I noticed that the order of two group started switching Thank you, I will look for the thread you said 2014-10-20 22:07 GMT+02:00 Alexandre Rafalovitch arafa...@gmail.com: What are the differences on. The document count or things like facets? This could be important. Also, I think there was a similar thread on the mailing list a week or two ago, might be worth looking for it. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 20 October 2014 04:49, Giovanni Bricconi giovanni.bricc...@banzai.it wrote: Hello I have a procedure that sends small data changes during the day to a solrcloud cluster, version 4.8 The cluster is made of three nodes, and three shards, each node contains two shards The procedure has been running for days; I don't know when but at some point one of the cores has gone out of synch and so repeating the same query has began to show small differences. The core graph was not useful, everything seemed active. I have solved the problem reindexing all, because the collection is quite small, but is there a way to fix this problem? Suppose I can figure out which core returns different results, is there a command to force that core to refetch the whole index from its master? Thanks Giovanni
Re: unstable results on refresh
Nice! I will monitor the index and try this if the problem comes back. Actually the problem was due to small differences in score, so I think the problem has the same origin 2014-10-21 8:10 GMT+02:00 lboutros boutr...@gmail.com: Hi Giovanni, we had this problem as well. The cause was that the different nodes have slightly different idf values. We solved this problem by doing an optimize operation which really remove suppressed data. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/unstable-results-on-refresh-tp4164913p4165086.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: unstable results on refresh
I noticed again the problem, now I was able to collect some data. in my paste http://pastebin.com/nVwf327c you can see the result of the same query issued twice, the 2nd and 3rd group are swapped. I pasted also the clusterstate and the core state for each core. The logs did'n show any problem related to indexing, only some malformed query. After doing an optimize the problem disappeared. So, is the problem related to documents that where deleted from the index? The optimization took 5 minutes to complete 2014-10-21 11:41 GMT+02:00 Giovanni Bricconi giovanni.bricc...@banzai.it: Nice! I will monitor the index and try this if the problem comes back. Actually the problem was due to small differences in score, so I think the problem has the same origin 2014-10-21 8:10 GMT+02:00 lboutros boutr...@gmail.com: Hi Giovanni, we had this problem as well. The cause was that the different nodes have slightly different idf values. We solved this problem by doing an optimize operation which really remove suppressed data. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/unstable-results-on-refresh-tp4164913p4165086.html Sent from the Solr - User mailing list archive at Nabble.com.
unstable results on refresh
Hello I have a procedure that sends small data changes during the day to a solrcloud cluster, version 4.8 The cluster is made of three nodes, and three shards, each node contains two shards The procedure has been running for days; I don't know when but at some point one of the cores has gone out of synch and so repeating the same query has began to show small differences. The core graph was not useful, everything seemed active. I have solved the problem reindexing all, because the collection is quite small, but is there a way to fix this problem? Suppose I can figure out which core returns different results, is there a command to force that core to refetch the whole index from its master? Thanks Giovanni
Re: solrcloud indexing completed event
Thank you Erick, Fortunately I can modify the data feeding process to start my post-indexing tasks. 2014-06-30 22:13 GMT+02:00 Erick Erickson erickerick...@gmail.com: The paradigm is different. In SolrCloud when a client sends an indexing request to any node in the system, when the response comes back all the nodes (leaders, followers, etc) have _all_ received the update and processed it. So you don't have to care in the same way. As far as different segments, versions, and all that this is entirely expected. Considering the above. Packet-leader. leader-follower. Each of them is independently indexing the documents, there is no replication. So, since the two servers started at different times, things like the autocommit interval can kick in at different times and the indexes diverge in terms of segment counts, version numbers, whatever. They'll return the same _documents_, but FWIW, Erick On Mon, Jun 30, 2014 at 7:55 AM, Giovanni Bricconi giovanni.bricc...@banzai.it wrote: Hello I have one application that queries solr; when the index version changes this application has to redo some tasks. Since I have more than one solr server, I would like to start these tasks when all solr nodes are synchronized. With master/slave configuration the application simply watched http://myhost:8080/solr/admin/cores?action=STATUScore=0bis on each solr node and checked that the commit time msec was equal. When the time changes and becomes equal on all the nodes the replication is complete and it is safe to restart the tasks. Now I would like to switch to a solrcloud configuration, splitting the core 0bis in 3 shards, with 2 replicas for each shard. After refeeding the collection I tried the same approach calling http://myhost:8080/solr/admin/cores?action=STATUScore=0bis_shard3_replica2 for each core of the collection, but with suprise I have found that on the same stripe the version of the index, the number of segments and even the commit time msec was different!! I was thinking that it was possible to check some parameter on each stripe's core to check that everithing was up to date, but this does not seem to be true. Is it possible somehow to capture the commit done on every core of the collection event? Thank you Giovanni
solrcloud indexing completed event
Hello I have one application that queries solr; when the index version changes this application has to redo some tasks. Since I have more than one solr server, I would like to start these tasks when all solr nodes are synchronized. With master/slave configuration the application simply watched http://myhost:8080/solr/admin/cores?action=STATUScore=0bis on each solr node and checked that the commit time msec was equal. When the time changes and becomes equal on all the nodes the replication is complete and it is safe to restart the tasks. Now I would like to switch to a solrcloud configuration, splitting the core 0bis in 3 shards, with 2 replicas for each shard. After refeeding the collection I tried the same approach calling http://myhost:8080/solr/admin/cores?action=STATUScore=0bis_shard3_replica2 for each core of the collection, but with suprise I have found that on the same stripe the version of the index, the number of segments and even the commit time msec was different!! I was thinking that it was possible to check some parameter on each stripe's core to check that everithing was up to date, but this does not seem to be true. Is it possible somehow to capture the commit done on every core of the collection event? Thank you Giovanni
Re: solr cloud 4.8, synonymfilterfactory and big dictionaries
Thank you Elaine, splitted files worked for me too. 2014-05-06 19:15 GMT+02:00 Cario, Elaine elaine.ca...@wolterskluwer.com: Hi Giovanni, I had the same issue just last week! I worked around it temporarily by segmenting the file into 1 MB files, and then using a comma-delimited list of files in the filter specification in the schema. There is a known issue around this: https://issues.apache.org/jira/browse/SOLR-4793 ...and presumably there is a param you can set in zookeeper and solr (jute.maxbuffersize) to override the 1 MB limit. I didn't have enough time to test that out (and its not clear to me what form the value should take), at the time it was easier for me to brute force the files. -Original Message- From: Giovanni Bricconi [mailto:giovanni.bricc...@banzai.it] Sent: Tuesday, May 06, 2014 12:11 PM To: solr-user Subject: solr cloud 4.8, synonymfilterfactory and big dictionaries Hello I am migrating an application to solrcloud and I have to deal with a big dictionary, about 10Mb It seems that I can't upload it to zookeper, is there a way of specifying an external file for the synonyms parameter? can I compress the file or split it in many small files? I have the same problem for SnowballPorterFilterFactory Thanks
solr cloud 4.8, synonymfilterfactory and big dictionaries
Hello I am migrating an application to solrcloud and I have to deal with a big dictionary, about 10Mb It seems that I can't upload it to zookeper, is there a way of specifying an external file for the synonyms parameter? can I compress the file or split it in many small files? I have the same problem for SnowballPorterFilterFactory Thanks
Re: Solr relevancy tuning
Hello Doug I have just watched the quepid demonstration video, and I strongly agree with your introduction: it is very hard to involve marketing/business people in repeated testing session, and speadsheets or other kind of files are not the right tool to use. Currenlty I'm quite alone in my tuning task and having a visual approach could be benefical for me, you are giving me many good inputs! I see that kelvin (my scripted tool) and queepid follows the same path. In queepid someone quickly whatches the results and applies colours to result, in kelvin you enter one on more queries (network cable, ethernet cable) and states that the result must contains ethernet in the title, or must come from a list of product categories. I also do diffs of results, before and after changes, to check what is going on; but I have to do that in a very unix-scripted way. Have you considered of placing a counter of total red/bad results in quepid? I use this index to have a quick overview of changes impact across all queries. Actually I repeat tests in production from times to time, and if I see the kelvin temperature rising (the number of errors going up) I know I have to check what's going on because new products maybe are having a bad impact on the index. I also keep counters of products with low quality images/no images at all or too short listings, sometimes are useful to undestand better what will happen if you change some bq/fq in the application. I see also that after changes in quepid someone have to check gray results and assign them a colour, in kelvin case sometimes the conditions can do a bit of magic (new product names still contains SM-G900F) but sometimes can introduce false errors (the new product name contains only Galaxy 5 and not the product code SM-G900F). So some checks are needed but with quepid everybody can do the check, with kelvin you have to change some line of a script, and not everybody is able/willing to do that. The idea of a static index is a good suggestion, I will try to have it in the next round of search engine improvement. Thank you Doug! 2014-04-09 17:48 GMT+02:00 Doug Turnbull dturnb...@opensourceconnections.com: Hey Giovanni, nice to meet you. I'm the person that did the Test Driven Relevancy talk. We've got a product Quepid (http://quepid.com) that lets you gather good/bad results for queries and do a sort of test driven development against search relevancy. Sounds similar to your existing scripted approach. Have you considered keeping a static catalog for testing purposes? We had a project with a lot of updates and date-dependent relevancy. This lets you create some test scenarios against a static data set. However, one downside is you can't recreate problems in production in your test setup exactly-- you have to find a similar issue that reflects what you're seeing. Cheers, -Doug On Wed, Apr 9, 2014 at 10:42 AM, Giovanni Bricconi giovanni.bricc...@banzai.it wrote: Thank you for the links. The book is really useful, I will definitively have to spend some time reformatting the logs to to access number of result founds, session id and much more. I'm also quite happy that my test cases produces similar results to the precision reports shown at the beginning of the book. Giovanni 2014-04-09 12:59 GMT+02:00 Ahmet Arslan iori...@yahoo.com: Hi Giovanni, Here are some relevant pointers : http://www.lucenerevolution.org/2013/Test-Driven-Relevancy-How-to-Work-with-Content-Experts-to-Optimize-and-Maintain-Search-Relevancy http://rosenfeldmedia.com/books/search-analytics/ http://www.sematext.com/search-analytics/index.html Ahmet On Wednesday, April 9, 2014 12:17 PM, Giovanni Bricconi giovanni.bricc...@banzai.it wrote: It is about one year I'm working on an e-commerce site, and unfortunately I have no information retrieval background, so probably I am missing some important practices about relevance tuning and search engines. During this period I had to fix many bugs about bad search results, which I have solved sometimes tuning edismax weights, sometimes creating ad hoc query filters or query boosting; but I am still not able to figure out what should be the correct process to improve search results relevance. These are the practices I am following, I would really appreciate any comments about them and any hints about what practices you follow in your projects: - In order to have a measure of search quality I have written many test cases such as if the user searches for nike sport watch the search result should display at least four tom tom products with the words nike and sportwatch in the title. I have written a tool that read such tests from json files and applies them to my applications, and then counts the number of results that does not match the criterias stated in the test cases. (for those interested
Solr relevancy tuning
It is about one year I'm working on an e-commerce site, and unfortunately I have no information retrieval background, so probably I am missing some important practices about relevance tuning and search engines. During this period I had to fix many bugs about bad search results, which I have solved sometimes tuning edismax weights, sometimes creating ad hoc query filters or query boosting; but I am still not able to figure out what should be the correct process to improve search results relevance. These are the practices I am following, I would really appreciate any comments about them and any hints about what practices you follow in your projects: - In order to have a measure of search quality I have written many test cases such as if the user searches for nike sport watch the search result should display at least four tom tom products with the words nike and sportwatch in the title. I have written a tool that read such tests from json files and applies them to my applications, and then counts the number of results that does not match the criterias stated in the test cases. (for those interested this tool is available at https://github.com/gibri/kelvin but it is still quite a prototype) - I use this count as a quality index, I tried various times to change the edismax weight to lower the whole number of error, or to add new filters/boostings to the application to try to decrease the error count. - The pros of this is that at least you have a number to look at, and that you have a quick way of checking the impact of a modification. - The bad side is that you have to maintain the test cases: now I have about 800 tests and my product catalogue changes often, this implies that some products exits the catalog, and some test cases cant pass anymore. - I am populating the test cases using errors reported from users, and I feel that this is driving the test cases too much toward pathologic cases. An more over I haven't many test for cases that are working well now. I would like to use search logs as drivers to generate tests, but I feel I haven't picked the right path. Using top queries, manually reviewing results, and then writing tests is a slow process; moreover many top queries are ambiguous or are driven by site ads. Many many queries are unique per users. How to deal with these cases? How are you using your log to find out test cases to fix? Are you looking for queries where the user is not opening any returned results? Which kpi have you chosen to find out query that are not providing good results? And what are you using as kpi for the whole search, beside the conversion rate? Can you suggest me any other practices you are using on your projects? Thank you very much in advance Giovanni
Re: Solr relevancy tuning
Thank you for the links. The book is really useful, I will definitively have to spend some time reformatting the logs to to access number of result founds, session id and much more. I'm also quite happy that my test cases produces similar results to the precision reports shown at the beginning of the book. Giovanni 2014-04-09 12:59 GMT+02:00 Ahmet Arslan iori...@yahoo.com: Hi Giovanni, Here are some relevant pointers : http://www.lucenerevolution.org/2013/Test-Driven-Relevancy-How-to-Work-with-Content-Experts-to-Optimize-and-Maintain-Search-Relevancy http://rosenfeldmedia.com/books/search-analytics/ http://www.sematext.com/search-analytics/index.html Ahmet On Wednesday, April 9, 2014 12:17 PM, Giovanni Bricconi giovanni.bricc...@banzai.it wrote: It is about one year I'm working on an e-commerce site, and unfortunately I have no information retrieval background, so probably I am missing some important practices about relevance tuning and search engines. During this period I had to fix many bugs about bad search results, which I have solved sometimes tuning edismax weights, sometimes creating ad hoc query filters or query boosting; but I am still not able to figure out what should be the correct process to improve search results relevance. These are the practices I am following, I would really appreciate any comments about them and any hints about what practices you follow in your projects: - In order to have a measure of search quality I have written many test cases such as if the user searches for nike sport watch the search result should display at least four tom tom products with the words nike and sportwatch in the title. I have written a tool that read such tests from json files and applies them to my applications, and then counts the number of results that does not match the criterias stated in the test cases. (for those interested this tool is available at https://github.com/gibri/kelvin but it is still quite a prototype) - I use this count as a quality index, I tried various times to change the edismax weight to lower the whole number of error, or to add new filters/boostings to the application to try to decrease the error count. - The pros of this is that at least you have a number to look at, and that you have a quick way of checking the impact of a modification. - The bad side is that you have to maintain the test cases: now I have about 800 tests and my product catalogue changes often, this implies that some products exits the catalog, and some test cases cant pass anymore. - I am populating the test cases using errors reported from users, and I feel that this is driving the test cases too much toward pathologic cases. An more over I haven't many test for cases that are working well now. I would like to use search logs as drivers to generate tests, but I feel I haven't picked the right path. Using top queries, manually reviewing results, and then writing tests is a slow process; moreover many top queries are ambiguous or are driven by site ads. Many many queries are unique per users. How to deal with these cases? How are you using your log to find out test cases to fix? Are you looking for queries where the user is not opening any returned results? Which kpi have you chosen to find out query that are not providing good results? And what are you using as kpi for the whole search, beside the conversion rate? Can you suggest me any other practices you are using on your projects? Thank you very much in advance Giovanni
question about synonymfilter
hello suppose I have this synonim abxpower = abx power and suppose you are indexing abxpower pipp From the analyzer I see that abxpower is splitted in two words, but the second word power overlaps the next one text raw_bytes keyword position start end type positionLength abxpower [61 62 78 70 6f 77 65 72] false 1 0 8 word 1 pipp [70 69 70 70] false 2 9 14 word 1 SF text raw_bytes positionLength type start end position keyword abx [61 62 78] 1 SYNONYM 0 8 1 false pipp [70 69 70 70] 1 word 9 14 2 falsepower [70 6f 77 65 72] 1 SYNONYM 9 14 2 false Is this correct? I noticed that WordDelimitedFilter instead changes start end and position. This is what appens for abx-power pippo WDF text raw_bytes start end type position positionLength abx [61 62 78] 0 3 word 1 1power [70 6f 77 65 72] 4 9 word 2 1 pippo [70 69 70 70 6f] 10 15 word 3 1
Re: Data import handler with multi tables
maybe entity name=bothTables query=select concat('A.',id) id, id originalId, nameA from tbl_tableA union all select concat('B.',id) id, id originalId, nameA from tbl_tableB field name=id column=id/ field name=originalId column=originalId/ !-- need a new field -- field name=nameA column=nameA / /entity So you can keep the original id, maybe add also an originalTable field if you don't like parsing the id colum to discover the table from which the data was read. 2013/10/29 Stefan Matheis matheis.ste...@gmail.com I've never looked for another way, what's the problem using a compound key? On Monday, October 28, 2013 at 1:38 PM, dtphat wrote: Hi, is there no another way to import all data for this case instead Only the way using compound key? Thanks. - Phat T. Dong -- View this message in context: http://lucene.472066.n3.nabble.com/Re-Data-import-handler-with-multi-tables-tp4098048p4098056.html Sent from the Solr - User mailing list archive at Nabble.com ( http://Nabble.com).
howto increase indexing speed?
I have a small solr setup, not even on a physical machine but a vmware virtual machine with a single cpu that reads data using DIH from a database. The machine has no phisical disks attached but stores data on a netapp nas. Currently this machine indexes 320 documents/sec, not bad but we plan to double the index and we would like to keep nearly the same. Doing some basic checks during the indexing I have found with iostat that the usage of the disks is nearly 8% and the source database is running fine, instead the virtual cpu is 95% running on solr. Now I can quite easily add another virtual cpu to the solr box, but as far as I know this won't help because DIH doesn't work in parallel. Am I wrong? What would you do? Rewrite the feeding process quitting dih and using solrj to feed data in parallel? Would you instead keep DIH and switch to a sharded configuration? Thank you for any hints Giovanni
Re: ClassNotFoundException regarding SolrInfoMBean under Tomcat 7
I saw something similar when I placed some jar in tomcat/lib (data import handler), the right place was instead WEB-INF/lib. I would try placing al needed jars there. 2013/7/5 Michael Bakonyi kont...@mb-neuemedien.de Hm, can't anybody help me out? I still can't get my installation run correctly ... What I've found out recently – if I understand it aright: SolrInfoMBean has somehow to do with JMX. So I manually activated JMX via inserting jmx / within my solrconfig.xml as described here: http://wiki.apache.org/solr/SolrJmx. But nevertheless the same Exception still appears ... Cheers, Michael Am 04.07.2013 um 13:02 schrieb Michael Bakonyi: Hi everyone, I'm trying to get the CMS TYPO3 connected with Solr 3.6.2. By now I followed the installation at http://wiki.apache.org/solr/SolrTomcat except that I didn't copy the .war-file into the $SOLR_HOME but referencing to it at a different location via Tomcat Context fragment file. Until then the Solr-Server works – I can reach the GUI via URL. To get Solr connected with the CMS I then created a new core-folder (btw. can anybody give me kind of a live example, when to use different cores? Until now I still don't really understand the concept of cores ..) by duplicating the example-folder in which I overwrote some files (especially solrconfig.xml) with files offered by the TYPO3-community. I also moved the file solr.xml one level up and edited it (added core-fragment and especially adjusted instanceDir) to get a correct multicore-setup like in the example multicore-setup within the downloaded solr-tgz-package. But now I get the Java-exception java.lang.NoClassDefFoundError: org/apache/solr/core/SolrInfoMBean at java.lang.ClassLoader.defineClass1(Native Method) In the Tomcat-log file it is said additionally: Caused by: java.lang.ClassNotFoundException: org.apache.solr.core.SolrInfoMBean. My guess is, that within the new solrconfig.xml there are calls to classes which aren't included correctly. There are some libs, which are included at the top of this file but the paths of the references should be ok as I checked them via Bash: At http://wiki.apache.org/solr/SolrConfigXml it is said that the lib dir= directory is relative to the instanceDir, so this is what I've checked. I also inserted absolute paths but this wasn't successful either. Can anybody give me a hint how to solve this problem? Would be great :) Cheers, Michael
Re: Is it possible to find a leader from a list of cores in solr via java code
I have the same question. My purpose is to start the dih full process on the leader and not on a replica. I tried full import on a replica but watching logs it seemed to me that the replica was loading data to send it to the leader which in turn has to update all the replicas. At least this is what I saw with solr 4.2.1 Giovanni 2013/7/3 Erick Erickson erickerick...@gmail.com You can always query Zookeeper and find that information out. Take a look at CloudSolrServer, maybe ZkCoreNodeProps etc. for examples since CloudSolrServer is leader aware, it should have some clues... Or maybe ZkStateReader? I haven't been in that code much, so I can't be more specific... But why do you have this requirement? What do you hope to accomplish? Because this is often the kind of thing that's seems more useful than it is... Best Erick On Wed, Jul 3, 2013 at 3:05 AM, vicky desai vicky.de...@germinait.com wrote: Hi , I have a set up of 1 leader and 1 replica and I have a requirement where in I need to find the leader core from the collection. Is there an api in solrj by means of which this can be achieved. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-find-a-leader-from-a-list-of-cores-in-solr-via-java-code-tp4074994.html Sent from the Solr - User mailing list archive at Nabble.com.
Replicas and soft commit
I have recently upgraded our application from solr 3.6 to solr 4.2.1, and I have just started learning about soft commits and partial updates. Currently I have one indexing node and 3 replicas of the same core, and every modification goes through a dih delta index. This is usually ok but I have some special cases where updates should be made visible very quickly. As I have seen with my first tests - it is possible to send partial updates and soft commits to each replica and to the indexer - and when the indexer gets an hard commit every replica is realligned. Is this the right approach or am I misunderstanding how to use this feature? I don't see soft commit propagation to replicas when sending update to the indexer only: is this true or maybe I haven't changed some configuration files when porting the application to solr4? Giovanni
custom facet.sort
I have a string field containing values such as 1khz 1ghz 1mhz etc. I use this field to show a facet, currently I'm showing results in facet.sort=count order. Now I'm asked to reorder the facet according to the unit of measure (khz/mhz/ghz). I also have 3/4 other custom sorting to implement Is it possible to plug in a custom java class to provide custom facet.sort modes? Thank you Giovanni
Re: Solr and OpenPipe
Bella lì! vedo che ci divertiamo Il giorno 28/mar/2013 17:11, Fabio Curti fabio.cu...@gmail.com ha scritto: git clone https://github.com/kolstae/openpipe cd openpipe mvn install regards -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-and-OpenPipe-tp484777p4052079.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 4 plugins
This is really interesting! Do you know if these added fields can be used in sorting or faceting? Tanks Il giorno 23/dic/2012 14:08, Otis Gospodnetic otis.gospodne...@gmail.com ha scritto: Hi, Look into writing a custom SearchComponent. Otis Solr ElasticSearch Support http://sematext.com/ On Dec 23, 2012 2:07 AM, Eyal Ben-Meir eya...@gmail.com wrote: Hi all, I want to use solr 4 as a full text search engine, but I need to make one of the query fields to get its answer not from lucene engine but from my own engine. The rest should continue as normal. Any ideas how to do it? Thanks.