how soft-commit works

2013-09-16 Thread Matteo Grolla
searcher? -Is it a good idea to set openSearcher=false in auto commit and rely on soft auto commit to see new data in searches? thanks Matteo Grolla

Improving indexing performance

2013-10-06 Thread Matteo Grolla
I'd like to have some suggestion on how to improve the indexing performance on the following scenario I'm uploading 1M docs to solr, every docs has id: sequential number title: small string date: date body: 1kb of text Here are my benchmarks (they are all

Re: Improving indexing performance

2013-10-08 Thread Matteo Grolla
this quite frequently, 15 seconds seems quite reasonable. Best, Erick On Sun, Oct 6, 2013 at 12:19 PM, Matteo Grolla matteo.gro...@gmail.com wrote: I'd like to have some suggestion on how to improve the indexing performance on the following scenario I'm uploading 1M docs to solr, every

How to size document cache

2013-10-25 Thread Matteo Grolla
Hi, I'd really appreciate if you could give me some help understanding how to tune the document cache. My thoughts: min values: max_results * max_concurrent_queries, as stated by http://wiki.apache.org/solr/SolrCaching how can I estimate max_concurrent_queries?

interpretation of cat_rank in http://people.apache.org/~hossman/ac2012eu/

2014-05-05 Thread Matteo Grolla
Hi everybody can anyone give me a suitable interpretation for cat_rank in http://people.apache.org/~hossman/ac2012eu/ slide 15 thanks

Re: interpretation of cat_rank in http://people.apache.org/~hossman/ac2012eu/

2014-05-06 Thread Matteo Grolla
Thanks a lot and thanks for pointing me at the video. I missed it Matteo Il giorno 05/mag/2014, alle ore 20:44, Chris Hostetter ha scritto: : Hi everybody : can anyone give me a suitable interpretation for cat_rank in : http://people.apache.org/~hossman/ac2012eu/ slide 15 Have

query(subquery, default) filters results

2014-05-06 Thread Matteo Grolla
Hi everybody, I'm having troubles with the function query query(subquery, default) http://wiki.apache.org/solr/FunctionQuery#query running this http://localhost:8983/solr/select?q=query($qq,1)qq={!dismax qf=text}hard drive on collection1 gives me no results but I

Re: query(subquery, default) filters results

2014-05-15 Thread Matteo Grolla
Thanks very much, i realized too late that that I skipped an important part of the wiki documentation this example assumes /detType=func thanks a lot Il giorno 06/mag/2014, alle ore 21:05, Yonik Seeley ha scritto: On Tue, May 6, 2014 at 5:08 AM, Matteo Grolla matteo.gro...@gmail.com

how to fully test a response writer

2014-07-23 Thread Matteo Grolla
Hi, I developed a new SolResponseWriter but I'm not happy with how I wrote tests. My problem is that I need to test it either with local request and with distributed request since the solr response object (input to the response writer) are different. a) I tested the local request case

order of updates

2014-11-03 Thread Matteo Grolla
HI, can anybody give me a confirm? If I add multiple document with the same id but differing on other fields and then issue a commit (no commits before this) the last added document gets indexed, right? H.p. using solr 4 and default settings for optimistic locking. Matteo

Re: order of updates

2014-11-03 Thread Matteo Grolla
Thanks really a lot Yonik! Il giorno 03/nov/2014, alle ore 15:51, Yonik Seeley ha scritto: On Mon, Nov 3, 2014 at 8:53 AM, Matteo Grolla matteo.gro...@gmail.com wrote: HI, can anybody give me a confirm? If I add multiple document with the same id but differing on other fields

add and then delete same document before commit,

2014-11-05 Thread Matteo Grolla
Can anyone tell me the behavior of solr (and if it's consistent) when I do what follows: 1) add document x 2) delete document x 3) commit I've tried with solr 4.5.0 and document x get's indexed Matteo

Re: add and then delete same document before commit,

2014-11-05 Thread Matteo Grolla
-Original Message- From: Matteo Grolla Sent: Wednesday, November 5, 2014 4:47 AM To: solr-user@lucene.apache.org Subject: add and then delete same document before commit, Can anyone tell me the behavior of solr (and if it's consistent) when I do what follows: 1) add document x 2

scanning all documents in the collection

2015-02-02 Thread Matteo Grolla
Hi, I'm thinking about having an instance of solr (SolrA) with all fields stored and just id indexed in addition with a normal production instance of solr (SolrB) that is used for the searches. This would allow me to read only what changed from previous crawl, update SolrA and send the

Re: scanning all documents in the collection

2015-02-02 Thread Matteo Grolla
Wow!!! thanks Joe! Il giorno 02/feb/2015, alle ore 15:05, Joseph Obernberger ha scritto: I have a similar use-case. Check out the export capability and using cursorMark. -Joe On 2/2/2015 8:14 AM, Matteo Grolla wrote: Hi, I'm thinking about having an instance of solr

solrcloud nodes registering as 127.0.1.1

2015-01-12 Thread Matteo Grolla
Hi, hope someone can help me troubleshoot this issue. I'm trying to setup a solrcloud cluster with -zookeeper on 192.168.1.8 (osx mac) -solr1 on 192.168.1.10 (virtualized ubuntu running on mac) -solr2 on 192.168.1.3 (ubuntu on another pc) the problem is

Re: solrcloud nodes registering as 127.0.1.1

2015-01-12 Thread Matteo Grolla
Solved! ubuntu has an entry like this in /etc/hosts 127.0.1.1 hostname to properly run solrcloud one must substitute 127.0.1.1 with a real (possibly permanent) ip address Il giorno 12/gen/2015, alle ore 12:47, Matteo Grolla ha scritto: Hi, hope someone can help me

stats component performance

2015-04-27 Thread Matteo Grolla
Hi, is there any public benchmark or description of how the solr stats component works? Matteo

please confirm: pseudo join queries can only be performed on fields of exactly the same type

2015-05-18 Thread Matteo Grolla
Hi, I tried performing a join query {!join from=fA to=fB} where fA was string and fB was text using keywordTokenizer it doesn't work, but it does if either fields are both string or both text. If you confirm this is the correct behavior I'll

Re: please confirm: pseudo join queries can only be performed on fields of exactly the same type

2015-05-18 Thread Matteo Grolla
used the keywordTokenizer, was there other analysis such as lowercasing going on? -Yonik On Mon, May 18, 2015 at 10:26 AM, Matteo Grolla matteo.gro...@gmail.com wrote: Hi, I tried performing a join query {!join from=fA to=fB} where fA was string and fB was text

Re: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

2015-05-30 Thread Matteo Grolla
I wouldn't add the complexity you're talking about, especially at the volumes you're talking. Best, Erick On Thu, May 21, 2015 at 3:20 AM, Matteo Grolla matteo.gro...@gmail.com wrote: Hi I'd like some feedback on how I'd like to solve the following sharding problem I have

optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

2015-05-21 Thread Matteo Grolla
Hi I'd like some feedback on how I'd like to solve the following sharding problem I have a collection that will eventually become big Average document size is 1.5kb Every year 30 Million documents will be indexed Data come from different document producers (a person, owner of his documents)

splitshard on live node: performance impact

2015-07-06 Thread Matteo Grolla
Hi, what is the performance impact of issuing a splitshard on a live node used for searches?

restore quorum after majority of zk nodes down

2015-10-29 Thread Matteo Grolla
I'm designing a solr cloud installation where nodes from a single cluster are distributed on 2 datacenters which are close and very well connected. let's say that zk nodes zk1, zk2 are on DC1 and zk2 is on DC2 and let's say that DC1 goes down and the cluster is left with zk3. how can I restore a

Re: restore quorum after majority of zk nodes down

2015-10-29 Thread Matteo Grolla
ver.wunderwood.org/ (my blog) > > > > On Oct 29, 2015, at 10:08 AM, Matteo Grolla <matteo.gro...@gmail.com> > wrote: > > > > I'm designing a solr cloud installation where nodes from a single cluster > > are distributed on 2 datacenters which are close and ve

Re: restore quorum after majority of zk nodes down

2015-10-30 Thread Matteo Grolla
o take all the zookeeper nodes > down. > > -- Pushkar Raste > On Oct 29, 2015 4:33 PM, "Matteo Grolla" <matteo.gro...@gmail.com> wrote: > > > Hi Walter, > > it's not a problem to take down zk for a short (1h) time and > > reconfigure it. Meanwhile

simple test on solr 5.2.1 wrong leader elected on startup

2015-10-15 Thread Matteo Grolla
Hi, I'm doing this test collection test is replicated on two solr nodes running on 8983, 8984 using external zk 1)turn on solr 8984 2)add,commit a doc x con solr 8983 3)turn off solr 8983 4)turn on solr 8984 5)shortly after (leader still not elected) turn on solr 8983 6)8984 is elected as

Re: simple test on solr 5.2.1 wrong leader elected on startup

2015-10-15 Thread Matteo Grolla
> > On 15 October 2015 at 16:16, Matteo Grolla <matteo.gro...@gmail.com> > wrote: > > > Hi, > > I'm doing this test > > collection test is replicated on two solr nodes running on 8983, 8984 > > using external zk > > > > 1)turn OFF solr 8984

Re: add and then delete same document before commit,

2015-11-18 Thread Matteo Grolla
ailure in 4.6 and a commit happened between the original > insert and the delete? Just askin'... > > Best, > Erick > > On Wed, Nov 18, 2015 at 8:21 AM, Matteo Grolla <matteo.gro...@gmail.com> > wrote: > > Thanks Shawn, > >I'm aware that solr isn't transactional and

Re: add and then delete same document before commit,

2015-11-18 Thread Matteo Grolla
solr version. 2015-11-18 16:51 GMT+01:00 Shawn Heisey <apa...@elyograg.org>: > On 11/18/2015 8:21 AM, Matteo Grolla wrote: > > On Solr 4.10.3 I'm noting a different (desired) behaviour > > > > 1) add document x > > 2) delete document x > > 3) commit

Re: add and then delete same document before commit,

2015-11-18 Thread Matteo Grolla
On Solr 4.10.3 I'm noting a different (desired) behaviour 1) add document x 2) delete document x 3) commit document x doesn't get indexed. The question now is: Can I count on this behaviour or is it just incidental? 2014-11-05 22:21 GMT+01:00 Matteo Grolla <matteo.gro...@gmail.

Re: error reporting during indexing

2015-09-29 Thread Matteo Grolla
time when the batch has errors and rely on Solr overwriting > any docs in the batch that were indexed the first time. > > Best, > Erick > > On Mon, Sep 28, 2015 at 2:27 PM, Matteo Grolla <matteo.gro...@gmail.com> > wrote: > > Hi, > > if I need fine grained er

error reporting during indexing

2015-09-28 Thread Matteo Grolla
Hi, if I need fine grained error reporting I use Http Solr server and send 1 doc per request using the add method. I report errors on exceptions of the add method, I'm using autocommit so I'm not seing errors related to commit. Am I loosing some errors? Is there a better way? Thanks

Re: enable disable filter query caching based on statistics

2016-01-05 Thread Matteo Grolla
; I ask, because these settings can solve the problems you've mentioned > > without the need to add any additional functionality. > > > > On Tue, Jan 5, 2016 at 9:04 PM Matteo Grolla <matteo.gro...@gmail.com> > > wrote: > > > >> Hi Binoy, > >>

Re: SOLR replicas performance

2016-01-05 Thread Matteo Grolla
Hi Luca, not sure if I understood well. Your question is "Why are index times on a solr cloud collecton with 2 replicas higher than on solr cloud with 1 replica" right? Well with 2 replicas all docs have to be deparately indexed in 2 places and solr has to confirm that both indexing went

enable disable filter query caching based on statistics

2016-01-05 Thread Matteo Grolla
Hi, after looking at the presentation of cloudsearch from lucene revolution 2014 https://www.youtube.com/watch?v=RI1x0d-yO8A=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP=49 min 17:08 I recognized I'd love to be able to remove the burden of disabling filter query caching from developers the problem:

Re: enable disable filter query caching based on statistics

2016-01-05 Thread Matteo Grolla
olrCaching > http://yonik.com/advanced-filter-caching-in-solr/ > > > On Tue, Jan 5, 2016 at 7:28 PM Matteo Grolla <matteo.gro...@gmail.com> > wrote: > > > Hi, > > after looking at the presentation of cloudsearch from lucene > revolution > > 2014 > >

Re: enable disable filter query caching based on statistics

2016-01-05 Thread Matteo Grolla
lauses are very restrictive, I > wonder what happens if > you add a cost in. fq's are evaluated in cost order (when > cache=false), so what happens > in this case? > ={!cache=false cost=101}n_rea:xxx={!cache=false > cost=102}provincia:={!cache=false cost=103}type: &g

realtime get requirements

2016-01-12 Thread Matteo Grolla
Hi, can you confirm me that realtime get requirements are just: true json true ${solr.ulog.dir:}

Re: realtime get requirements

2016-01-12 Thread Matteo Grolla
Thanks Shawn, On a production solr instance some cores take a long time to load while other of similar size take much less. One of the differences between these cores is the directoryFactory. 2016-01-12 15:34 GMT+01:00 Shawn Heisey <apa...@elyograg.org>: > On 1/12/2016 2:50 A

Re: realtime get requirements

2016-01-12 Thread Matteo Grolla
ok, suggester was responsible for the long time to load. Thanks 2016-01-12 15:47 GMT+01:00 Matteo Grolla <matteo.gro...@gmail.com>: > Thanks Shawn, > On a production solr instance some cores take a long time to load > while other of similar size take much less. One of

optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Hi, I'm trying to optimize a solr application. The bottleneck are queries that request 1000 rows to solr. Unfortunately the application can't be modified at the moment, can you suggest me what could be done on the solr side to increase the performance? The bottleneck is just on fetching the

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
and it takes 15s execute it with rows = 400 and it takes 3s it seems that below rows = 400 times are acceptable, beyond they get slow 2016-02-11 11:27 GMT+01:00 Upayavira <u...@odoko.co.uk>: > > > On Thu, Feb 11, 2016, at 09:33 AM, Matteo Grolla wrote: > > Hi, > > I'

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Matteo Grolla <matteo.gro...@gmail.com>: > Hi Yonic, > after the first query I find 1000 docs in the document cache. > I'm using curl to send the request and requesting javabin format to mimic > the application. > gc activity is low > I managed to load the entire 50GB index

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
anymore. Time improves now queries that took ~30s take <10s. But I hoped better I'm going to use jvisualvm's sampler to analyze where time is spent 2016-02-11 15:25 GMT+01:00 Yonik Seeley <ysee...@gmail.com>: > On Thu, Feb 11, 2016 at 7:45 AM, Matteo Grolla <matteo.gro...@gma

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
<jack.krupan...@gmail.com>: > Is this a scenario that was working fine and suddenly deteriorated, or has > it always been slow? > > -- Jack Krupansky > > On Thu, Feb 11, 2016 at 4:33 AM, Matteo Grolla <matteo.gro...@gmail.com> > wrote: > > > Hi, > >

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
[image: Immagine incorporata 1] 2016-02-11 16:05 GMT+01:00 Matteo Grolla <matteo.gro...@gmail.com>: > I see a lot of time spent in splitOnTokens > > which is called by (last part of stack trace) > > BinaryResponseWriter$Resolver.writeResultsBody() > ... > solr.sea

Re: optimize requests that fetch 1000 rows

2016-02-12 Thread Matteo Grolla
re consuming the bulk of qtime. > > -- Jack Krupansky > > On Thu, Feb 11, 2016 at 11:33 AM, Matteo Grolla <matteo.gro...@gmail.com> > wrote: > > > virtual hardware, 200ms is taken on the client until response is written > to > > disk > > qtime on solr is

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
t;t...@statsbiblioteket.dk>: > On Thu, 2016-02-11 at 11:53 +0100, Matteo Grolla wrote: > > I'm working with solr 4.0, sorting on score (default). > > I tried setting the document cache size to 2048, so all docs of a single > > request fit (2 requests fit actually) > &g

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
; What does they query look like? Is it complex or use wildcards or function > queries, or is it very simple keywords? How many operators? > > Have you used the debugQuery=true parameter to see which search components > are taking the time? > > -- Jack Krupansky > > On Thu,

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
nt modern hardware. > > -- Jack Krupansky > > On Thu, Feb 11, 2016 at 10:36 AM, Matteo Grolla <matteo.gro...@gmail.com> > wrote: > > > Hi Jack, > > response time scale with rows. Relationship doens't seem linear but > > Below 400 rows times are much fas

query logging using query rest api

2016-04-28 Thread Matteo Grolla
Hi, I'm experimenting the query rest api with solr 5.4 and I'm noticing that query parameters are not logged in solr.log. Here are query and log line curl -XGET 'localhost:8983/solr/test/query' -d '{"query":"*:*"}' 2016-04-28 09:16:54.008 INFO (qtp668849042-17) [ x:test] o.a.s.c.S.Request

Re: problems with nested queries

2016-05-23 Thread Matteo Grolla
c name_t:"white cat" > > Can you open a JIRA for this? > > -Yonik > > > On Mon, May 16, 2016 at 10:23 AM, Matteo Grolla <matteo.gro...@gmail.com> > wrote: > > Hi everyone, > > I have a problem with nested queries > > I

problems with nested queries

2016-05-16 Thread Matteo Grolla
Hi everyone, I have a problem with nested queries If the order is: 1) query 2) nested query (embedded in _query_:"...") everything works fine if it is the opposite, like this

Re: query logging using query rest api

2016-05-02 Thread Matteo Grolla
you cannot use GET > HTTP method ( -XGET ) and pass parameters in POST (-d). > > Try to remove the -XGET parameter. > > On Thu, Apr 28, 2016 at 11:18 AM, Matteo Grolla <matteo.gro...@gmail.com> > wrote: > > > Hi, > > I'm experimenting the query rest api wi

Re: matchAllDocsQuery instead of WildCardQuery from lucene qp with df and *

2016-07-28 Thread Matteo Grolla
Hi Alessandro, your shot in the dark was interesting, but the behaviour doesn't depend on the field being mandatory, it works like this for every field. So it seems just wrong df=field=* should be translated as field:* not as *:* 2016-07-28 10:32 GMT+02:00 Matteo Grolla <matteo.

Re: matchAllDocsQuery instead of WildCardQuery from lucene qp with df and *

2016-07-28 Thread Matteo Grolla
(field); // *:* -> MatchAllDocsQuery if ("*".equals(termStr)) { if ("*".equals(field) || getExplicitField() == null) { return newMatchAllDocsQuery(); } } 2016-07-28 9:40 GMT+02:00 Matteo Grolla <matteo.gro...@gmail.com>: > I noticed the behaviour

Re: matchAllDocsQuery instead of WildCardQuery from lucene qp with df and *

2016-07-28 Thread Matteo Grolla
I noticed the behaviour in solr 4.10 and 5.4.1 2016-07-28 9:36 GMT+02:00 Matteo Grolla <matteo.gro...@gmail.com>: > Hi, > I'm surprised by lucene query parser translating this query > > http://localhost:8983/solr/collection1/select?df=id=* > > in > >

matchAllDocsQuery instead of WildCardQuery from lucene qp with df and *

2016-07-28 Thread Matteo Grolla
Hi, I'm surprised by lucene query parser translating this query http://localhost:8983/solr/collection1/select?df=id=* in MatchAllDocsQuery(*:*) I was expecting it to execute: "id:*" is it a bug or a desired behaviour? If desired can you explain why?

solr-8258

2016-07-07 Thread Matteo Grolla
Hi, the export handler returns 0 for null numeric values. Can someone explain me why it doesn't leave the field off the record like string or multivalue fields? thanks Matteo

export handler date fields

2016-07-07 Thread Matteo Grolla
Hi, is there a reason why the export handler doesn't support date fields? thanks Matteo Grolla

size-estimator-lucene-solr.xls error in disk space estimator

2017-04-27 Thread Matteo Grolla
It seems me that the estimation in MB is in fact an estimation in GB the formula includes the avg doc size, which is in kb, so the result is in kb and should be divided by 1024 to obtain the result in MB. But it's divided by 1024*1024

Re: size-estimator-lucene-solr.xls error in disk space estimator

2017-04-27 Thread Matteo Grolla
Right Alessandro that's another bug Cheers 2017-04-27 12:30 GMT+02:00 alessandro.benedetti : > +1 > I would add that what is called : "Avg. Document Size (KB)" seems more to > me > "Avg. Field Size (KB)". > Cheers > > > > - > --- > Alessandro Benedetti >

analyzing infix suggester building in near real time LUCENE-5477

2018-05-21 Thread Matteo Grolla
Hi everyone, I'm evaluating suggesters that can can be in near real time and I came across https://issues.apache.org/jira/browse/LUCENE-5477. Is there a way to use this functionality from solr? Thanks very much Matteo Grolla

Uninverting stats on solr 5 and beyond

2018-04-09 Thread Matteo Grolla
Hi, on solr 4 the log contained informations about time spent and memory consumed uninverting a field. Where can I find this information on current version of solr? Thanks --excerpt from solr 4.10 log-- INFO - 2018-04-09 15:57:58.720; org.apache.solr.request.UnInvertedField; UnInverted

Re: Could not load collection from ZK:

2018-09-20 Thread Matteo Grolla
Hi everybody, I'm facing the same problem on solr 7.3. Probably requesting a longer session to zk (the default 10s seems too short) will solve the problem but I'm puzzled by the fact that this error is reported by solrj as a SolrException with status code 400 (BAD_REQUEST). in ZkStateReader