Re: sample_techproducts tutorial (8.1 guide) has wrong collectioname?

2019-06-27 Thread Thomas Egense
Thank you,
 I will fix the image to have the correct collection name. It was confusing
showing a different collection image overview
that the one you see when following the tutorial.
/Thomas

On Thu, Jun 27, 2019 at 3:45 PM Alexandre Rafalovitch 
wrote:

> Actually, the tutorial does say "Here’s the first place where we’ll
> deviate from the default options." and the result name should be
> techproducts.
>
> It is the image that is no longer correct and needs to be updated. And
> perhaps the text should be made clearer.
>
> A pull request with updated image (and matching JIRA) would be most
> welcome. As would any comments on the tutorial sequence in general, as
> we haven't touched it for quite a while. In fact, if somebody wanted
> to flash out the whole tutorial sequence to be more in line with the
> recent Solr features.
>
> Regards,
>    Alex.
>
> On Thu, 27 Jun 2019 at 07:42, Thomas Egense 
> wrote:
> >
> > Solr 8.1 tutorial:
> > https://lucene.apache.org/solr/guide/8_1/solr-tutorial.html
> >
> > Following the guide to where you have created the collection and checking
> > the
> > admin page, you get the same picture as shown in
> > "Figure 1. SolrCloud Diagram"
> > (collectionname = gettingstarted) <---
> >
> > Next step is indexing the tech-products samples:
> > solr-8.1.0:$ bin/post -c techproducts example/exampledocs/*
> >
> > But this fails, since the collectionname is "gettingstarted"
> >
> > Instead you have to index with
> > bin/post -c gettingstarted example/exampledocs/*
> >
> > In earlier tutorials the collection name  was indeed "techproducts", so
> it
> > is
> > the collection name that has changed.
> >
> > It is just me doing something wrong? It is hard to believe a such obvious
> > error
> > has not been corrected yet? It seems the 7.1 tutorial has the same error.
> >
> > /Thomas Egense
>


sample_techproducts tutorial (8.1 guide) has wrong collectioname?

2019-06-27 Thread Thomas Egense
Solr 8.1 tutorial:
https://lucene.apache.org/solr/guide/8_1/solr-tutorial.html

Following the guide to where you have created the collection and checking
the
admin page, you get the same picture as shown in
"Figure 1. SolrCloud Diagram"
(collectionname = gettingstarted) <---

Next step is indexing the tech-products samples:
solr-8.1.0:$ bin/post -c techproducts example/exampledocs/*

But this fails, since the collectionname is "gettingstarted"

Instead you have to index with
bin/post -c gettingstarted example/exampledocs/*

In earlier tutorials the collection name  was indeed "techproducts", so it
is
the collection name that has changed.

It is just me doing something wrong? It is hard to believe a such obvious
error
has not been corrected yet? It seems the 7.1 tutorial has the same error.

/Thomas Egense


Re: [ANN] Lucidworks Fusion 1.0.0

2014-09-23 Thread Thomas Egense
Hi Grant.
Will there be a Fusion demostration/presentation  at Lucene/Solr Revolution
DC? (Not listed in the program yet).


Thomas Egense

On Mon, Sep 22, 2014 at 3:45 PM, Grant Ingersoll gsing...@apache.org
wrote:

 Hi All,

 We at Lucidworks are pleased to announce the release of Lucidworks Fusion
 1.0.   Fusion is built to overlay on top of Solr (in fact, you can manage
 multiple Solr clusters -- think QA, staging and production -- all from our
 Admin).In other words, if you already have Solr, simply point Fusion at
 your instance and get all kinds of goodies like Banana (
 https://github.com/LucidWorks/Banana -- our port of Kibana to Solr + a
 number of extensions that Kibana doesn't have), collaborative filtering
 style recommendations (without the need for Hadoop or Mahout!), a modern
 signal capture framework, analytics, NLP integration, Boosting/Blocking and
 other relevance tools, flexible index and query time pipelines as well as a
 myriad of connectors ranging from Twitter to web crawling to Sharepoint.
 The best part of all this?  It all leverages the infrastructure that you
 know and love: Solr.  Want recommendations?  Deploy more Solr.  Want log
 analytics?  Deploy more Solr.  Want to track important system metrics?
 Deploy more Solr.

 Fusion represents our commitment as a company to continue to contribute a
 large quantity of enhancements to the core of Solr while complementing and
 extending those capabilities with value adds that integrate a number of 3rd
 party (e.g connectors) and home grown capabilities like an all new,
 responsive UI built in AngularJS.  Fusion is not a fork of Solr.  We do not
 hide Solr in any way.  In fact, our goal is that your existing applications
 will work out of the box with Fusion, allowing you to take advantage of new
 capabilities w/o overhauling your existing application.

 If you want to learn more, please feel free to join our technical webinar
 on October 2: http://lucidworks.com/blog/say-hello-to-lucidworks-fusion/.
 If you'd like to download: http://lucidworks.com/product/fusion/.

 Cheers,
 Grant Ingersoll

 
 Grant Ingersoll | CTO
 gr...@lucidworks.com | @gsingers
 http://www.lucidworks.com




Re: How much free disk space will I need to optimize my index

2014-06-26 Thread Thomas Egense
That is correct, but twice the disk space is theoretically not enough.
Worst case is actually three times the storage, I guess this worst case can
happen if you also submit new documents to the index while optimizing.
I have experienced 2.5 times the disk space during an optimize for a large
index, it was a 1TB index that temporarily used 2.5TB disc space during the
optimize (near the end of the optimization).

From,
Thomas Egense


On Wed, Jun 25, 2014 at 8:21 PM, Markus Jelsma markus.jel...@openindex.io
wrote:





 -Original message-
  From:johnmu...@aol.com johnmu...@aol.com
  Sent: Wednesday 25th June 2014 20:13
  To: solr-user@lucene.apache.org
  Subject: How much free disk space will I need to optimize my index
 
  Hi,
 
 
  I need to de-fragment my index.  My question is, how much free disk
 space I need before I can do so?  My understanding is, I need 1X free disk
 space of my current index un-optimized index size before I can optimize it.
  Is this true?

 Yes, 20 GB of FREE space to force merge an existing 20 GB index.

 
 
  That is, let say my index is 20 GB (un-optimized) then I must have 20 GB
 of free disk space to make sure the optimization is successful.  The reason
 for this is because during optimization the index is re-written (is this
 the case?) and if it is already optimized, the re-write will create a new
 20 GB index before it deletes the old one (is this true?), thus why there
 must be at least 20 GB free disk space.
 
 
  Can someone help me with this or point me to a wiki on this topic?
 
 
  Thanks!!!
 
 
  - MJ
 



Re: Problem faceting

2014-06-12 Thread Thomas Egense
First of all, make sure you use docvalues for facet fields with many unique
values.

If that still does not help you can try the following.
My kollega Toke Eskildsen  has made a huge improvement when faceting IF the
number of results in the facets are less than 8% of the total number of
documents.
In this case we get a substantial improvement in both memory use and query
time:
See: https://plus.google.com/+TokeEskildsen/posts/7oGxWZRKJEs
We have tested it for index with 300M documents.

From,
Thomas Egense



On Wed, Jun 11, 2014 at 5:36 PM, marcos palacios mpcmar...@gmail.com
wrote:

 Hello everyone.



 I’m having problems with the performance of queries with  facets, the temp
 expend to resolve a query is very high.



 The index has 10Millions of documents, each one with 100 fields.

 The server has 8 cores and 56 Gb of ram, running with jetty with this
 memory configuration: -Xms24096m -Xmx44576m



 When I do a query, with 20 facets, the time expended is 4 – 5 seconds. If
 the same request is did another time, the



 Debug query first execution:

 double name=time6037.0/doublelst name=querydouble
 name=time265.0/double/lstlst name=facetdouble
 name=time5772.0/double/lst



 Debug query seconds executions:

 double name=time6037.0/doublelst name=querydouble
 name=time1.0/double/lstlst name=facetdouble
 name=time4872.0/double/lst





 What can I do? Why the facets are not cached?





 Thank you, Marcos



Re: How to set the shardid?

2013-10-29 Thread Thomas Egense
You can specify the shard in core.properties ie:
core.properties:
name=collection2
shard=shard2

Did this solve it ?

From,
Thomas Egense




On Mon, Feb 25, 2013 at 5:13 PM, Mark Miller markrmil...@gmail.com wrote:


 On Feb 25, 2013, at 10:00 AM, Markus.Mirsberger 
 markus.mirsber...@gmx.de wrote:

  How can I fix the shardId used at one server when I create a collection?
 (Im using the solrj collections api to create collections)

 You can't do it with the collections API currently. If you want to control
 the shard names explicitly, you have to use the CoreAdmin API to create
 each core - that lets you set the shard id.

 - Mark


Re: Minor bug with CloudSolrServer and collection-alias.

2013-10-24 Thread Thomas Egense
Thanks to both of you for fixing the bug. Impressive response time for the
fix (7 hours).

Thomas Egense


On Wed, Oct 23, 2013 at 7:16 PM, Mark Miller markrmil...@gmail.com wrote:

 I filed https://issues.apache.org/jira/browse/SOLR-5380 and just
 committed a fix.

 - Mark

 On Oct 23, 2013, at 11:15 AM, Shawn Heisey s...@elyograg.org wrote:

  On 10/23/2013 3:59 AM, Thomas Egense wrote:
  Using cloudSolrServer.setDefaultCollection(collectionId) does not work
 as
  intended for an alias spanning more than 1 collection.
  The virtual collection-alias collectionID is recoqnized as a existing
  collection, but it does only query one of the collections it is mapped
 to.
 
  You can confirm this easy in AliasIntegrationTest.
 
  The test-class AliasIntegrationTest creates to cores with 2 and 3
 different
  documents. And then creates an alias pointing to both of them.
 
  Line 153:
 // search with new cloud client
 CloudSolrServer cloudSolrServer = new
  CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
 cloudSolrServer.setParallelUpdates(random().nextBoolean());
 query = new SolrQuery(*:*);
 query.set(collection, testalias);
 res = cloudSolrServer.query(query);
 cloudSolrServer.shutdown();
 assertEquals(5, res.getResults().getNumFound());
 
  No unit-test bug here, however if you change it from setting the
  collectionid on the query but on CloudSolrServer instead,it will produce
  the bug:
 
 // search with new cloud client
 CloudSolrServer cloudSolrServer = new
 CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
 cloudSolrServer.setDefaultCollection(testalias);
 cloudSolrServer.setParallelUpdates(random().nextBoolean());
 query = new SolrQuery(*:*);
 //query.set(collection, testalias);
 res = cloudSolrServer.query(query);
 cloudSolrServer.shutdown();
 assertEquals(5, res.getResults().getNumFound());  -- Assertion
 failure
 
  Should I create a Jira issue for this?
 
  Thomas,
 
  I have confirmed this with the following test patch, which adds to the
  test rather than changing what's already there:
 
  http://apaste.info/9ke5
 
  I'm about to head off to the train station to start my commute, so I
  will be unavailable for a little while.  If you haven't gotten the jira
  filed by the time I get to another computer, I will create it.
 
  Thanks,
  Shawn
 




Minor bug with CloudSolrServer and collection-alias.

2013-10-23 Thread Thomas Egense
I found this bug in both 4.4 and 4.5

Using cloudSolrServer.setDefaultCollection(collectionId) does not work as
intended for an alias spanning more than 1 collection.
The virtual collection-alias collectionID is recoqnized as a existing
collection, but it does only query one of the collections it is mapped to.

You can confirm this easy in AliasIntegrationTest.

The test-class AliasIntegrationTest creates to cores with 2 and 3 different
documents. And then creates an alias pointing to both of them.

Line 153:
// search with new cloud client
CloudSolrServer cloudSolrServer = new
CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
cloudSolrServer.setParallelUpdates(random().nextBoolean());
query = new SolrQuery(*:*);
query.set(collection, testalias);
res = cloudSolrServer.query(query);
cloudSolrServer.shutdown();
assertEquals(5, res.getResults().getNumFound());

No unit-test bug here, however if you change it from setting the
collectionid on the query but on CloudSolrServer instead,it will produce
the bug:

// search with new cloud client
CloudSolrServer cloudSolrServer = new
CloudSolrServer(zkServer.getZkAddress(), random().nextBoolean());
cloudSolrServer.setDefaultCollection(testalias);
cloudSolrServer.setParallelUpdates(random().nextBoolean());
query = new SolrQuery(*:*);
//query.set(collection, testalias);
res = cloudSolrServer.query(query);
cloudSolrServer.shutdown();
assertEquals(5, res.getResults().getNumFound());  -- Assertion failure

Should I create a Jira issue for this?

From,
Thomas Egense


SolrCloud. Scale-test by duplicating same index to the shards and make it behave each index is different (uniqueId).

2013-10-01 Thread Thomas Egense
Hello everyone,
I have a small challenge performance testing a SolrCloud setup. I have 10
shards, and each shard is supposed to have index-size ~200GB. However I
only have a single index of 200GB because it will take too long to build
another index with different data,  and I hope to somehow use this index on
all 10 shards and make it behave as all documents are different on each
shard. So building more indexes from new data is not an option.

Making a query to a SolrCloud is a two-phase operation. First all shards
receive the query and return ID's and ranking. The merger will then remove
duplicate ID's and then the full documents will be retreived.

When I copy this index to all shards and make a request the following will
happen: Phase one: All shards will receive the query and return ids+ranking
(actually same set from all shards). This part is realistic enough.
Phase two: ID's will be merged and retrieving the documents is not
realistic as if they were spread out between shards (IO wise).

Is there any way I can 'fake' this somehow and have shards return a
prefixed_id for phase1 etc., which then also have to be undone when
retriving the documents for phase2.  I have tried making the hack in
org.apache.solr.handler.component.QueryComponent and a few other classes,
but no success. (The resultset are always empty). I do not need to index
any new documents, which would also be a challenge due to the ID
hash-interval for the shards with this hack.

Anyone has a good idea how to make this hack work?

From,
Thomas Egense