Re: sample_techproducts tutorial (8.1 guide) has wrong collectioname?
Thank you, I will fix the image to have the correct collection name. It was confusing showing a different collection image overview that the one you see when following the tutorial. /Thomas On Thu, Jun 27, 2019 at 3:45 PM Alexandre Rafalovitch wrote: > Actually, the tutorial does say "Here’s the first place where we’ll > deviate from the default options." and the result name should be > techproducts. > > It is the image that is no longer correct and needs to be updated. And > perhaps the text should be made clearer. > > A pull request with updated image (and matching JIRA) would be most > welcome. As would any comments on the tutorial sequence in general, as > we haven't touched it for quite a while. In fact, if somebody wanted > to flash out the whole tutorial sequence to be more in line with the > recent Solr features. > > Regards, > Alex. > > On Thu, 27 Jun 2019 at 07:42, Thomas Egense > wrote: > > > > Solr 8.1 tutorial: > > https://lucene.apache.org/solr/guide/8_1/solr-tutorial.html > > > > Following the guide to where you have created the collection and checking > > the > > admin page, you get the same picture as shown in > > "Figure 1. SolrCloud Diagram" > > (collectionname = gettingstarted) <--- > > > > Next step is indexing the tech-products samples: > > solr-8.1.0:$ bin/post -c techproducts example/exampledocs/* > > > > But this fails, since the collectionname is "gettingstarted" > > > > Instead you have to index with > > bin/post -c gettingstarted example/exampledocs/* > > > > In earlier tutorials the collection name was indeed "techproducts", so > it > > is > > the collection name that has changed. > > > > It is just me doing something wrong? It is hard to believe a such obvious > > error > > has not been corrected yet? It seems the 7.1 tutorial has the same error. > > > > /Thomas Egense >
sample_techproducts tutorial (8.1 guide) has wrong collectioname?
Solr 8.1 tutorial: https://lucene.apache.org/solr/guide/8_1/solr-tutorial.html Following the guide to where you have created the collection and checking the admin page, you get the same picture as shown in "Figure 1. SolrCloud Diagram" (collectionname = gettingstarted) <--- Next step is indexing the tech-products samples: solr-8.1.0:$ bin/post -c techproducts example/exampledocs/* But this fails, since the collectionname is "gettingstarted" Instead you have to index with bin/post -c gettingstarted example/exampledocs/* In earlier tutorials the collection name was indeed "techproducts", so it is the collection name that has changed. It is just me doing something wrong? It is hard to believe a such obvious error has not been corrected yet? It seems the 7.1 tutorial has the same error. /Thomas Egense
Re: [ANN] Lucidworks Fusion 1.0.0
Hi Grant. Will there be a Fusion demostration/presentation at Lucene/Solr Revolution DC? (Not listed in the program yet). Thomas Egense On Mon, Sep 22, 2014 at 3:45 PM, Grant Ingersoll gsing...@apache.org wrote: Hi All, We at Lucidworks are pleased to announce the release of Lucidworks Fusion 1.0. Fusion is built to overlay on top of Solr (in fact, you can manage multiple Solr clusters -- think QA, staging and production -- all from our Admin).In other words, if you already have Solr, simply point Fusion at your instance and get all kinds of goodies like Banana ( https://github.com/LucidWorks/Banana -- our port of Kibana to Solr + a number of extensions that Kibana doesn't have), collaborative filtering style recommendations (without the need for Hadoop or Mahout!), a modern signal capture framework, analytics, NLP integration, Boosting/Blocking and other relevance tools, flexible index and query time pipelines as well as a myriad of connectors ranging from Twitter to web crawling to Sharepoint. The best part of all this? It all leverages the infrastructure that you know and love: Solr. Want recommendations? Deploy more Solr. Want log analytics? Deploy more Solr. Want to track important system metrics? Deploy more Solr. Fusion represents our commitment as a company to continue to contribute a large quantity of enhancements to the core of Solr while complementing and extending those capabilities with value adds that integrate a number of 3rd party (e.g connectors) and home grown capabilities like an all new, responsive UI built in AngularJS. Fusion is not a fork of Solr. We do not hide Solr in any way. In fact, our goal is that your existing applications will work out of the box with Fusion, allowing you to take advantage of new capabilities w/o overhauling your existing application. If you want to learn more, please feel free to join our technical webinar on October 2: http://lucidworks.com/blog/say-hello-to-lucidworks-fusion/. If you'd like to download: http://lucidworks.com/product/fusion/. Cheers, Grant Ingersoll Grant Ingersoll | CTO gr...@lucidworks.com | @gsingers http://www.lucidworks.com
Re: How much free disk space will I need to optimize my index
That is correct, but twice the disk space is theoretically not enough. Worst case is actually three times the storage, I guess this worst case can happen if you also submit new documents to the index while optimizing. I have experienced 2.5 times the disk space during an optimize for a large index, it was a 1TB index that temporarily used 2.5TB disc space during the optimize (near the end of the optimization). From, Thomas Egense On Wed, Jun 25, 2014 at 8:21 PM, Markus Jelsma markus.jel...@openindex.io wrote: -Original message- From:johnmu...@aol.com johnmu...@aol.com Sent: Wednesday 25th June 2014 20:13 To: solr-user@lucene.apache.org Subject: How much free disk space will I need to optimize my index Hi, I need to de-fragment my index. My question is, how much free disk space I need before I can do so? My understanding is, I need 1X free disk space of my current index un-optimized index size before I can optimize it. Is this true? Yes, 20 GB of FREE space to force merge an existing 20 GB index. That is, let say my index is 20 GB (un-optimized) then I must have 20 GB of free disk space to make sure the optimization is successful. The reason for this is because during optimization the index is re-written (is this the case?) and if it is already optimized, the re-write will create a new 20 GB index before it deletes the old one (is this true?), thus why there must be at least 20 GB free disk space. Can someone help me with this or point me to a wiki on this topic? Thanks!!! - MJ
Re: Problem faceting
First of all, make sure you use docvalues for facet fields with many unique values. If that still does not help you can try the following. My kollega Toke Eskildsen has made a huge improvement when faceting IF the number of results in the facets are less than 8% of the total number of documents. In this case we get a substantial improvement in both memory use and query time: See: https://plus.google.com/+TokeEskildsen/posts/7oGxWZRKJEs We have tested it for index with 300M documents. From, Thomas Egense On Wed, Jun 11, 2014 at 5:36 PM, marcos palacios mpcmar...@gmail.com wrote: Hello everyone. I’m having problems with the performance of queries with facets, the temp expend to resolve a query is very high. The index has 10Millions of documents, each one with 100 fields. The server has 8 cores and 56 Gb of ram, running with jetty with this memory configuration: -Xms24096m -Xmx44576m When I do a query, with 20 facets, the time expended is 4 – 5 seconds. If the same request is did another time, the Debug query first execution: double name=time6037.0/doublelst name=querydouble name=time265.0/double/lstlst name=facetdouble name=time5772.0/double/lst Debug query seconds executions: double name=time6037.0/doublelst name=querydouble name=time1.0/double/lstlst name=facetdouble name=time4872.0/double/lst What can I do? Why the facets are not cached? Thank you, Marcos
Re: How to set the shardid?
You can specify the shard in core.properties ie: core.properties: name=collection2 shard=shard2 Did this solve it ? From, Thomas Egense On Mon, Feb 25, 2013 at 5:13 PM, Mark Miller markrmil...@gmail.com wrote: On Feb 25, 2013, at 10:00 AM, Markus.Mirsberger markus.mirsber...@gmx.de wrote: How can I fix the shardId used at one server when I create a collection? (Im using the solrj collections api to create collections) You can't do it with the collections API currently. If you want to control the shard names explicitly, you have to use the CoreAdmin API to create each core - that lets you set the shard id. - Mark
Re: Minor bug with CloudSolrServer and collection-alias.
Thanks to both of you for fixing the bug. Impressive response time for the fix (7 hours). Thomas Egense On Wed, Oct 23, 2013 at 7:16 PM, Mark Miller markrmil...@gmail.com wrote: I filed https://issues.apache.org/jira/browse/SOLR-5380 and just committed a fix. - Mark On Oct 23, 2013, at 11:15 AM, Shawn Heisey s...@elyograg.org wrote: On 10/23/2013 3:59 AM, Thomas Egense wrote: Using cloudSolrServer.setDefaultCollection(collectionId) does not work as intended for an alias spanning more than 1 collection. The virtual collection-alias collectionID is recoqnized as a existing collection, but it does only query one of the collections it is mapped to. You can confirm this easy in AliasIntegrationTest. The test-class AliasIntegrationTest creates to cores with 2 and 3 different documents. And then creates an alias pointing to both of them. Line 153: // search with new cloud client CloudSolrServer cloudSolrServer = new CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean()); cloudSolrServer.setParallelUpdates(random().nextBoolean()); query = new SolrQuery(*:*); query.set(collection, testalias); res = cloudSolrServer.query(query); cloudSolrServer.shutdown(); assertEquals(5, res.getResults().getNumFound()); No unit-test bug here, however if you change it from setting the collectionid on the query but on CloudSolrServer instead,it will produce the bug: // search with new cloud client CloudSolrServer cloudSolrServer = new CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean()); cloudSolrServer.setDefaultCollection(testalias); cloudSolrServer.setParallelUpdates(random().nextBoolean()); query = new SolrQuery(*:*); //query.set(collection, testalias); res = cloudSolrServer.query(query); cloudSolrServer.shutdown(); assertEquals(5, res.getResults().getNumFound()); -- Assertion failure Should I create a Jira issue for this? Thomas, I have confirmed this with the following test patch, which adds to the test rather than changing what's already there: http://apaste.info/9ke5 I'm about to head off to the train station to start my commute, so I will be unavailable for a little while. If you haven't gotten the jira filed by the time I get to another computer, I will create it. Thanks, Shawn
Minor bug with CloudSolrServer and collection-alias.
I found this bug in both 4.4 and 4.5 Using cloudSolrServer.setDefaultCollection(collectionId) does not work as intended for an alias spanning more than 1 collection. The virtual collection-alias collectionID is recoqnized as a existing collection, but it does only query one of the collections it is mapped to. You can confirm this easy in AliasIntegrationTest. The test-class AliasIntegrationTest creates to cores with 2 and 3 different documents. And then creates an alias pointing to both of them. Line 153: // search with new cloud client CloudSolrServer cloudSolrServer = new CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean()); cloudSolrServer.setParallelUpdates(random().nextBoolean()); query = new SolrQuery(*:*); query.set(collection, testalias); res = cloudSolrServer.query(query); cloudSolrServer.shutdown(); assertEquals(5, res.getResults().getNumFound()); No unit-test bug here, however if you change it from setting the collectionid on the query but on CloudSolrServer instead,it will produce the bug: // search with new cloud client CloudSolrServer cloudSolrServer = new CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean()); cloudSolrServer.setDefaultCollection(testalias); cloudSolrServer.setParallelUpdates(random().nextBoolean()); query = new SolrQuery(*:*); //query.set(collection, testalias); res = cloudSolrServer.query(query); cloudSolrServer.shutdown(); assertEquals(5, res.getResults().getNumFound()); -- Assertion failure Should I create a Jira issue for this? From, Thomas Egense
SolrCloud. Scale-test by duplicating same index to the shards and make it behave each index is different (uniqueId).
Hello everyone, I have a small challenge performance testing a SolrCloud setup. I have 10 shards, and each shard is supposed to have index-size ~200GB. However I only have a single index of 200GB because it will take too long to build another index with different data, and I hope to somehow use this index on all 10 shards and make it behave as all documents are different on each shard. So building more indexes from new data is not an option. Making a query to a SolrCloud is a two-phase operation. First all shards receive the query and return ID's and ranking. The merger will then remove duplicate ID's and then the full documents will be retreived. When I copy this index to all shards and make a request the following will happen: Phase one: All shards will receive the query and return ids+ranking (actually same set from all shards). This part is realistic enough. Phase two: ID's will be merged and retrieving the documents is not realistic as if they were spread out between shards (IO wise). Is there any way I can 'fake' this somehow and have shards return a prefixed_id for phase1 etc., which then also have to be undone when retriving the documents for phase2. I have tried making the hack in org.apache.solr.handler.component.QueryComponent and a few other classes, but no success. (The resultset are always empty). I do not need to index any new documents, which would also be a challenge due to the ID hash-interval for the shards with this hack. Anyone has a good idea how to make this hack work? From, Thomas Egense