timeAllowed behavior

2016-03-07 Thread Anatoli Matuskova
Hey there,

I'm a bit lots with timeAllowed lately. I'm not using solr cloud and have a
monolitic index. I have the Solr version 4.5.1 in production. Now I'm
testing Solr 5 and timeAllowed seems to behave different. In 4.5, when it
was hit, it used to return the partial results it could collect. Now, when
the timeAllowed is hit it's not returning and partial document.

Is that normal?

Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/timeAllowed-behavior-tp4262110.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: doubt about timeAllowed

2016-02-16 Thread Anatoli Matuskova
Is there any way to tell timeAllow to just affect query component and not the
others?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/doubt-about-timeAllowed-tp4257363p4257622.html
Sent from the Solr - User mailing list archive at Nabble.com.


doubt about timeAllowed

2016-02-15 Thread Anatoli Matuskova
Hey there,

I have a doubt about using time allowed. Long ago it used to affect just de
queryComponent. Now it seems to be affecting all components.So, in the past,
if you used queryComponent, facetComponent and highlightComponent, you might
got a subset of the results in the queryComponent due to the query taking
too long (and timeAllowed working) but you would always get the highlighted
part of the subset. 

With the latests versions, it can happen that due to the timeAllowed I get a
subset of the results in the queryComponent but due to the timeAllowed too,
I get no return on the highlightComponent, which is critical to my use case.

Is there any way to avoid that?

I remember that in the past there was a TimeLimitingCollector in the
GetDocListAndSet function in the SolrIndexSearcher buy now I see that class
has changed a lot and does not work that way anymore.

Thanks in advance.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/doubt-about-timeAllowed-tp4257363.html
Sent from the Solr - User mailing list archive at Nabble.com.


boost docs if token matches happen in the first 5 words

2013-07-18 Thread Anatoli Matuskova
I've a set of documents with a WhiteSpaceTokenize field. I want to give more
boost when the match of the query happens in the first 3 token positions of
the field. Is there any way to do that (don't want to use payloads as they
mean on more seek to disk so lower performance)
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/boost-docs-if-token-matches-happen-in-the-first-5-words-tp4078786.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: boost docs if token matches happen in the first 5 words

2013-07-18 Thread Anatoli Matuskova
Thanks for the quick answer Markus.
Could you give me a a guideline or point me where to check in the solr
source code to see how to get it done?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/boost-docs-if-token-matches-happen-in-the-first-5-words-tp4078786p4078792.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr indexer and Hadoop

2013-07-02 Thread Anatoli Matuskova
If you can upload your data to hdfs you can use this patch to build the solr
indexes:
https://issues.apache.org/jira/browse/SOLR-1301



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexer-and-Hadoop-tp4072951p4074635.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: performance on concurrent search request

2013-04-05 Thread Anatoli Matuskova
Does anyone know how is this implemented?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-on-concurrent-search-request-tp4053182p4054030.html
Sent from the Solr - User mailing list archive at Nabble.com.


performance on concurrent search request

2013-04-02 Thread Anatoli Matuskova
In this thread about performance on concurrent search requests, Otis said:
http://lucene.472066.n3.nabble.com/how-to-improve-concurrent-request-performance-and-stress-testing-td496411.html

/Imagine this type of code: 

synchronized (someGlobalObject) { 
  // search 
} 

What happens when  100 threads his this spot?  The first one to get there
gets in and runs the search and 99 of them wait. 
What happens if that  // search also involves expensive operations, lots
of IO, warming up, cache population, etc?  Those 99 threads will have to
wait a while :) 

That's why it is recommended to warm up the searcher ahead of time before
exposing it to real requests.  However, even if you warm things up, that
sync block will remain there, and at some point this will become a
bottleneck.  What that point is depends on the hardware, index size, query
complexity and rat, even JVM. 

Otis /

I'm wondering if this synchronized is still an issue in Solr 4.x? Is it
because how Solr deals with the index searcher or is it because how it is
implemented in Lucene?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-on-concurrent-search-request-tp4053182.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: bf, nested queries and local params

2013-01-22 Thread Anatoli Matuskova
q=tablebf=product(scale(product(query({!v='color'}),1),0,1),100) 

This worked! Now looking at debug query I see that the nested query is using
the default field and default op. from schema.xml How could I pass params to
the nested query as the defType or qf. I would like to do something like
this (but I'm getting errors):

q=tablebf=product(scale(product(query({!v='color' !qf=title,description
!defType=dismax !pf=title^2.0,description^2.0}),1),0,1),100) 

I don't know how to express the params of the nested query, having errors
all the time (removing spaces between params results in errors too).




--
View this message in context: 
http://lucene.472066.n3.nabble.com/bf-nested-queries-and-local-params-tp4035216p4035284.html
Sent from the Solr - User mailing list archive at Nabble.com.


bf, nested queries and local params

2013-01-21 Thread Anatoli Matuskova
I'm trying to reach this:

Having this query:
q=tablebf=product(scale({!type=dismax qf=description,color v='with
color'},0,1),price)

And using on q (in solrconfig.xml) defType=dismax, qf=title, description

I'm trying to query for table and influence the score by doing the product
of the price with the score of the query with color on the fields
description and color, using defType dismax too and scaling from 0 to 1.

However I'm getting this error:
lst name=error
str name=msgundefined field: qf/str
int name=code400/int
/lst

So how could I nest a query (with defType=dismax) inside the first parameter
of the scale function?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/bf-nested-queries-and-local-params-tp4035216.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: bf, nested queries and local params

2013-01-21 Thread Anatoli Matuskova
Looks like doing this:

/Wrap the query argument for the scale function with the query function: 

q=tablebf=product(scale(query({!type=dismax qf=description,color v='with 
color'}),0,1),price) /

I'm still getting the same error. Something might be wrong when writing the
nested query but can't figure out what



--
View this message in context: 
http://lucene.472066.n3.nabble.com/bf-nested-queries-and-local-params-tp4035216p4035218.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr-4.0 and high cpu usage

2012-07-05 Thread Anatoli Matuskova
Hello,
I'm testing solr-4.0-alpha compared to 1.4. My index is optimized to one
segment. I've seen a decrease in memory usage but a very high increase in
CPU. This high cpu usage ends up giving me slower response times with 4.0
than 1.4
The server I'm using: Linux version 2.6.30-2-amd64 16 2.26GHz Intel Pentium
Xeon Processors, 8GB RAM
I have a jmeter sending queries using between 1 and 4 threads.
The queries doesn't use faceting, neighter filters. Simple search to 5 text
fields using dismax
The index has 1M docs and is 1.2Gb size
I've checked memory with jvmconsole and it's not GC fault.
The index is built using the TieredMergePolicy for 4.0 and
LogByteSizeMergePolicy for 1.4 but both were optimized to 1 segment (so in
that case the merge policy shouldn't make any difference, am I correct?).
I'm not indexing any docs while doing the tests
Average response time came from 0.1sec to 0.3 sec

Here is a graph of the cpu increase:
Any advice or something I should take into account? With the same resources
solr-4.0 is being 3 times slower than 1.4 and I don't know what I'm doing
wrong




http://lucene.472066.n3.nabble.com/file/n3993187/ganglia-solr.png 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-4-0-and-high-cpu-usage-tp3993187.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr-4.0 and high cpu usage [SOLVED]

2012-07-05 Thread Anatoli Matuskova
Found why!
On Solr 1.4 dismax param mm defaults to 1 if not specified, which is
equivalent to AND. On Solr 4.0 if mm is not specified, the default operator
is used, which defaults to OR. That made return much more results for each
query I was running, increasing the response time and the CPU usage.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-4-0-and-high-cpu-usage-tp3993187p3993275.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: deploy a brand new index in solrcloud

2012-06-10 Thread Anatoli Matuskova
I've thought in setting replication in solrCloud:
http://www.searchworkings.org/forum/-/message_boards/view_message/339527#_19_message_339527
What I don't know is if while replication is being handled, the replica
slaves (that are not the master in replication) can keep handling puts via
transaction log

--
View this message in context: 
http://lucene.472066.n3.nabble.com/deploy-a-brand-new-index-in-solrcloud-tp3988731p3988757.html
Sent from the Solr - User mailing list archive at Nabble.com.


Search calendar avaliability

2011-10-27 Thread Anatoli Matuskova
hello,
I want to filter search by calendar availability. For each document I know
the days which it is not available.
How could I build my fields filter the documents that are available in a
range of dates?
For example, a document A is available from 1-9-2011 to 5-9-2011 and is
available from 17-9-2011 to 22-9-2011 too (it's no available in the gap in
between)
If the filter query asks for avaliables from 2-9-2011 to 4-9-2011 docA would
be a match.
If the filter query for avaliables from 2-9-2011 to 20-9-2011 docA wouldn't
be a match as even the start and end are avaliables there's a gap of no
avaliability between them.
is this possible with Solr?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457203.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search calendar avaliability

2011-10-27 Thread Anatoli Matuskova
I don't like the idea of indexing a doc per each value, the dataset can grow
a lot. I have thought that something like this could work:
At indexing time, if I know the dates of no avaliability, I could gather the
avaliability ones (will consider unknown as available). So, I index 4 fields
aval_yes_start, aval_yes_end, aval_no_start, aval_no_end (all are
multiValued)
If the user ask for avaliability from $start to $end I filter like:

fq=aval_yes_start:[$start TO $end]fq=aval_yes_end:[$start TO
$end]fq=*-*aval_no_start:[$start TO $end]fq=*-*aval_no_end:[$start TO
$end]

This way I make sure start date is available, end dates too and there are no
unavaliable gaps in between.
As I save ranges and no concrete days the number of multiValued shouldn't
grow a lot and using trie fields I think these range queries should be fast.

Any better idea?
 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457810.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search calendar avaliability

2011-10-27 Thread Anatoli Matuskova
 What does a lot mean?  How high is the sky? 
If I have 3 milion docs I would end up with 3 milion * days avaliable

 This can be done.  And given that you want long stretches of availability, 
 but what happens when a reservation is canceled?  You have to coalesce 
 intervals.  That isn't impossible, but it is a pain. 

 Would this count as premature optimization? 

I always build the index from scratch indexing from an external datasource,
getting the avaliability from there (and all the other data from a document)

 If you want to drive down to a resolution of seconds, the document time
 slot 
 model doesn't work.  But for days, it probably does. 

yes, the avaliability is defined per days, not per seconds.

I'm trying to find the way to make this perform as better as possible.
I've found this and it's interesting too:
https://issues.apache.org/jira/browse/SOLR-1913
But the only way I see to use it is generate dinamic fields per month and
filter using them. The problem here is that for each month I want to filter
a search request, I would have to load a FieldCache.getInts and will quickly
run OOM.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457899.html
Sent from the Solr - User mailing list archive at Nabble.com.