Get Recently Added/Updated Documents

2016-03-15 Thread Lyuba Romanchuk
Hi,

I have the following scenario:

   - there are 2 machines running solr 4.8.1
   - there are different time zones on both machines
   - the clock is not synchronized on both machines

Autorefresh query running each X-2 seconds should return documents for last
X seconds and the performance impact should be low as much as possible
(perfectly, should take less then second).

First of all, I added first-component that overrides NOW param set by main
shard in order to treat the local NOW time on each solr machine.
And I added a new custom function
recent_docs(ms_since_now(_version_),X)=recip(ms(NOW,_version_ to
milliseconds),0.01/X,1,1).

Then I thought about 2 possible solutions but there is disadvantage for
each one and now I try to decide which one is the most optimal.
And maybe there are another solutions that I didn't think about.

   1. *Solution 1*: use boosting for _version_ field like this: q={!boost
   b=recent_docs(ms_since_now(_version_),X)}*:*
   1. _version_ because I need to receive the recently updated documents
  and the time of the document shouldn't be changed. And I saw
from the code
  that the _version_ is calculated based on the time
  2. It's good for sorting because all documents are sorting by scoring
  but in this case all documents are matched and I need to return only
  documents with score from [0.1 to 1]. I may filter by _version_
field but I
  prefer not to do it due to performance.
  3. *Question*:
 1. what is the performance impact for such scoring?
 2. *how can I return only documents with scoring from 0.1 to 1*?
  2. *Solution 2*: use query function like this:  fq={!frange l=0.1
   u=1}recent_docs(ms_since_now(_version_),X)
   1. in this case only relevant documents are returned but they are not
  sorted and sorting by _version_ or adding scoring seems is not  efficient
  because in such case the same function will be claculated twice
  2. it seems that there is very high performance impact to use this
  query function on large cores with hundred millions of documents
  3. *Questions*:
 1. *what is the most optimal way to sort the returned documents
 without calculating twice the same function*?
 2. and what is the performance impact of such filter query, is
 FieldCache is used?
 3. May it drastic increase the memory consumption of solr on very
 updated cores with millions of documents?


Any assistance/suggestion/comment will be very appreciated.

Thank you.

Best regards,
Lyuba


adding support for deleteInstanceDir from solrj

2013-08-26 Thread Lyuba Romanchuk
Hi all,

Did anyone have a chance to look at the code?

It's attached here: https://issues.apache.org/jira/browse/SOLR-5023.



Thank you very much.

Lyuba


Re: [Solr 4.2] deleteInstanceDir is added to CoreAdminHandler but is not supported in Unload CoreAdminRequest

2013-07-09 Thread Lyuba Romanchuk
According to code, at least in Solr 4.2, getParams of CoreAdminRequest.Unload
returns locally created ModifiableSolrParams.
It means that parameters that are set in such way won't be received in
CoreAdminHandler.

I'm going to open an issue in Jira and provide a patch for this.

Best regards,
Lyuba



On Fri, Jul 5, 2013 at 6:12 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 SolrJ doesn't have explicit support for that param but you can always
 add it yourself.

 For example:
 CoreAdminRequest.Unload req = new CoreAdminRequest.Unload(false);
 ((ModifiableSolrParams) req.getParams()).set(deleteInstanceDir, true);
 req.process(server);

 On Thu, Jul 4, 2013 at 12:50 PM, Lyuba Romanchuk
 lyuba.romanc...@gmail.com wrote:
  Hi,
 
  I need to unload core with deleting instance directory of the core.
  According to code of Solr4.2 I don't see the support for this parameter
 in
  solrj.
  Is there the fix or open issue for this?
 
  Best regards,
  Lyuba



 --
 Regards,
 Shalin Shekhar Mangar.



[Solr 4.2] deleteInstanceDir is added to CoreAdminHandler but is not supported in Unload CoreAdminRequest

2013-07-04 Thread Lyuba Romanchuk
Hi,

I need to unload core with deleting instance directory of the core.
According to code of Solr4.2 I don't see the support for this parameter in
solrj.
Is there the fix or open issue for this?

Best regards,
Lyuba


Re: [Solr 4.2.1] LotsOfCores - Can't query cores with loadOnStartup=true and transient=true

2013-05-22 Thread Lyuba Romanchuk
Hi Erick,

I opened an issue in JIRA: SOLR-4850. But I don't see how to change an
assignee, I don't think that I have permissions to do it.


Thank you.
Best regards,
Lyuba


On Mon, May 20, 2013 at 6:05 PM, Erick Erickson erickerick...@gmail.comwrote:

 Lyuba:

 Could you go ahead and raise a JIRA and assign it to me to
 investigate? You should definitely be able to define cores this way.

 Thanks,
 Erick

 On Sun, May 19, 2013 at 9:27 AM, Lyuba Romanchuk
 lyuba.romanc...@gmail.com wrote:
  Hi,
 
  It seems like in order to query transient cores they must be defined with
  loadOnStartup=false.
 
  I define one core loadOnStartup=true and transient=false, and another
  cores to be  loadOnStartup=true and transient=true, and
  transientCacheSize=Integer.MAX_VALUE.
 
  In this case CoreContainer.dynamicDescriptors will be empty and then
  CoreContainer.getCoreFromAnyList(String) and
 CoreContainer.getCore(String)
  returns null for all transient cores.
 
  I looked at the code of 4.3.0 and it doesn't seem that the flow was
  changed, the core is added only if it's not loaded on start up.
 
  Could you please assist with this issue?
 
  Best regards,
  Lyuba



[Solr 4.2.1] LotsOfCores - Can't query cores with loadOnStartup=true and transient=true

2013-05-19 Thread Lyuba Romanchuk
Hi,

It seems like in order to query transient cores they must be defined with
loadOnStartup=false.

I define one core loadOnStartup=true and transient=false, and another
cores to be  loadOnStartup=true and transient=true, and
transientCacheSize=Integer.MAX_VALUE.

In this case CoreContainer.dynamicDescriptors will be empty and then
CoreContainer.getCoreFromAnyList(String) and CoreContainer.getCore(String)
returns null for all transient cores.

I looked at the code of 4.3.0 and it doesn't seem that the flow was
changed, the core is added only if it's not loaded on start up.

Could you please assist with this issue?

Best regards,
Lyuba


Re: Solr 4.0 - timeAllowed in distributed search

2013-01-21 Thread Lyuba Romanchuk
Hi Michael,

Thank you very much for your reply!

Does it mean that when timeAllowed is used only search is interrupted and
document retrieval is not?

In order to check the total time of the query I run curl with linux time to
measure the total time including retrieving of documents. If I understood
your answer correctly I had to get a similar total time in both cases but
according to the results they are similar to QTime and to each other:

   - for non distributed: QTime=789 ms when total time is ~1 sec
   - for distributed: QTime=7.75 sec and total time is 7.9 sec.

Here is the output of the curls (direct_query.xml and distributed_query.xml
contain 30,000 documents in the reply):

Directly ask the shard:**

time curl '
http://localhost:8983/solr/shard_2013-01-07/select?q=*:*rows=3timeAllowed=500partialResults=truedebugQuery=true
'  direct_query.xml


real0m1.025s

user0m0.008s

sys 0m0.053s

from direct_query.xml:

lst name=responseHeader

bool name=partialResultstrue/bool

int name=status0/int

int name=QTime789/int

lst name=params

str name=rows3/strstr name=q*:*/str

str name=timeAllowed500/str

str name=partialResultstrue/strstr
name=debugQuerytrue/str/lst/lstresult name=response
numFound=28965249 start=0



Ask the shard through distributed search:


*time curl '
http://localhost:8983/solr/shard_2013-01-07/select?q=*:*rows=3shards=127.0.0.1%3A8983%2Fsolr%2Fshard_2013-01-07timeAllowed=500partialResults=trueshards.info=truedebug=true
' * distributed_query.xml



real0m7.905s

user0m0.010s

sys 0m0.052s


from distributed_query.xml:


lst name=responseHeader

bool name=partialResultstrue/bool

int name=status0/int

int name=QTime7750/int

lst name=params

str name=q*:*/str

str name=debugtrue/str

str name=shards127.0.0.1:8983/solr/shard_2013-01-07/str

str name=partialResultstrue/str

str name=shards.infotrue/str

str name=rows3/str

str name=timeAllowed500/str/lst/lst

lst name=shards.info

lst name=127.0.0.1:8983/solr/shard_2013-01-07long
name=numFound28193020/longfloat name=maxScore1.0/floatlong
name=time895/long/lst/lst

result name=response numFound=28193020 start=0 maxScore=1.0




Best regards,
Lyuba


On Sun, Jan 20, 2013 at 6:49 PM, Michael Ryan mr...@moreover.com wrote:

 (This is based on my knowledge of 3.6 - not sure if this has changed in
 4.0)

 You are using rows=3, which requires retrieving 3 documents from
 disk. In a non-distributed search, the QTime will not include the time it
 takes to retrieve these documents, but in a distributed search, it will.
 For a *:* query, the document retrieval will almost always be the slowest
 part of the query. I'd suggest measuring how long it takes for the response
 to be returned, or use rows=0.

 The timeAllowed feature is very misleading. It only applies to a small
 portion of the query (which in my experience is usually not the part of the
 query that is actually slow). Do not depend on timeAllowed doing anything
 useful :)

 -Michael

 -Original Message-
 From: Lyuba Romanchuk [mailto:lyuba.romanc...@gmail.com]
 Sent: Sunday, January 20, 2013 6:36 AM
 To: solr-user@lucene.apache.org
 Subject: Solr 4.0 - timeAllowed in distributed search

 Hi,

 I try to use timeAllowed in query both in distributed search with one
 shard and directly to the same shard.
 I send the same query with timeAllowed=500 :

- directly to the shard then QTime ~= 600 ms
- through distributes search to the same shard QTime ~= 7 sec.

 I have two questions:

- It seems that timeAllowed parameter doesn't work for distributes
search, does it?
- What may be the reason that causes the query to the shard through
distributes search takes much more time than to the shard directly (the
same distribution remains without timeAllowed parameter in the query)?


 Test results:

 Ask one shard through distributed search:



 http://localhost:8983/solr/shard_2013-01-07/select?q=*:*rows=3shards=127.0.0.1%3A8983%2Fsolr%2Fshard_2013-01-07timeAllowed=500partialResults=trueshards.info=truedebugQuery=true
 response
 lst name=responseHeader
 bool name=partialResultstrue/bool
 int name=status0/int
 int name=QTime7307/int
 lst name=params
 str name=q*:*/str
 str name=shards127.0.0.1:8983/solr/shard_2013-01-07/str
 str name=partialResultstrue/str
 str name=debugQuerytrue/str
 str name=shards.infotrue/str
 str name=rows3/str
 str name=timeAllowed500/str/lst/lst
 lst name=shards.info
 lst name=127.0.0.1:8983/solr/shard_2013-01-07
 long name=numFound29574223/long
 float name=maxScore1.0/float
 long name=time646/long/lst/lst
 result name=response numFound=29574223 start=0 maxScore=1.0 ...
 30,000 docs
 ...
 lst name=debug
 str name=rawquerystring*:*/str
 str name=querystring*:*/str
 str name=parsedqueryMatchAllDocsQuery(*:*)/str
 str name=parsedquery_toString*:*/str
 str name=QParserLuceneQParser/str
 lst name=timingdouble name=time6141.0/double lst
 name=preparedouble name=time0.0/double lst
 name

[Solr 4.0] what is stored in .tim index file format?

2012-04-17 Thread Lyuba Romanchuk
Hi,

I have index ~31G where
27% of the index size is .fdt files (8.5G)
20% - .fdx files (6.2G)
37% - .frq files (11.6G)
16% - .tim files (5G)

I didn't manage to find the description for .tim files. Can you help me
with this?

Thank you.
Best regards,
Lyuba


[Solr 4.0] Is it possible to do soft commit from code and not configuration only

2012-04-12 Thread Lyuba Romanchuk
Hi,



I need to configure the solr so that the opened searcher will see a new
document immidiately after it was adding to the index.

And I don't want to perform commit each time a new document is added.

I tried to configure maxDocs=1 under autoSoftCommit in solrconfig.xml but
it didn't help.

Is there way to perform soft commit from code in Solr 4.0 ?


Thank you in advance.

Best regards,

Lyuba


Re: [Solr 4.0] Is it possible to do soft commit from code and not configuration only

2012-04-12 Thread Lyuba Romanchuk
Hi Mark,

Thank you for reply.

I tried to normalize data like in relational databases:

   - there are some types of documents where \
  - documents with the same type have the same fields
  - documents with not equal types may have different fields
  - but all documents have type field and unique key field id .
   - there is main type (all records with this type contains pointers
   to the corresponding records of other types)

There is the configuration that defines what information should be stored
in each type.
When I get a new data for indexing first of all I check if such document is
already in the index\
using facets by the corresponding fields and query on relevant type.
I add documents to solr index without commit from the code but with
autocommit and autoSoftCommit with maxDocs=1 in the solrconfig.xml.
But here there is a problem that if I add a new record for some type the
searcher doesn't see it immediately.
It causes that I get some equal records with the same type but different
ids (unique key).

If I do commit from code after each document is added it works OK but it's
not a solution.
So I wanted to try to do soft commit after adding documents with not-main
type  from code. I searched in wiki documents
but found only commit without parameters and commit with parameters that
don't seem to be what I need.

Best regards,
Lyuba
*
*
*
*

On Thu, Apr 12, 2012 at 6:55 PM, Mark Miller markrmil...@gmail.com wrote:


 On Apr 12, 2012, at 11:28 AM, Lyuba Romanchuk wrote:

  Hi,
 
 
 
  I need to configure the solr so that the opened searcher will see a new
  document immidiately after it was adding to the index.
 
  And I don't want to perform commit each time a new document is added.
 
  I tried to configure maxDocs=1 under autoSoftCommit in solrconfig.xml but
  it didn't help.

 Can you elaborate on didn't help? You couldn't find any docs unless you
 did an explicit commit? If that is true and there is no user error, this
 would be a bug.

 
  Is there way to perform soft commit from code in Solr 4.0 ?

 Yes - check out the wiki docs - I can't remember how it is offhand (I
 think it was slightly changed recently).

 
 
  Thank you in advance.
 
  Best regards,
 
  Lyuba

 - Mark Miller
 lucidimagination.com














[Solr 4.0] soft commit with API of Solr 4.0

2012-04-11 Thread Lyuba Romanchuk
Hi All,

Is there way to perform soft commit from code in Solr 4.0 ?
Is it possible only from solrconfig.xml through enabling autoSoftCommit
with maxDocs and/or maxTime attributes?

Thank you in advance.
Best regards,
Lyuba


[Solr 3.5] Facets and stats become a lot slower during concurrent inserts

2011-12-27 Thread Lyuba Romanchuk
Hi,

I test facets and stats in Solr 3.5 and I see that queries are running a
lot slower during inserts into index with more than 15M documents .
If I stop to insert new documents facet/stats queries run 10-1000 times
faster than with concurrent inserts.
I don't see this degradation in Lucene.

Could you please explain what may cause this?
Is it Solr related issue only?

Thank you for help.

Best regards,
Lyuba


Re: [Solr 3.5] Facets and stats become a lot slower during concurrent inserts

2011-12-27 Thread Lyuba Romanchuk
autoCommit is disabled in solrconfig.xml and I use
SolrServer::addBeans(beans, 100) for inserts.
I need to insert new documents continually in high rate with concurrent
running queries.

Best regards,
Lyuba

On Tue, Dec 27, 2011 at 6:15 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Tue, Dec 27, 2011 at 10:43 AM, Lyuba Romanchuk
 lyuba.romanc...@gmail.com wrote:
  I test facets and stats in Solr 3.5 and I see that queries are running a
  lot slower during inserts into index with more than 15M documents .

 Are you also doing commits (or have autocommit enabled)?
 The first time a facet command is used for a field after a commit,
 certain data structures need to be constructed.
 To avoid slow first requests like this, you can add a request that
 does the faceting as a static warming query that will be run before
 any live queries use the new searcher.

 -Yonik
 http://www.lucidimagination.com