Hi Mikhail,


- is it possible to keep both type of data at the same core? Why not?

We have two separate feeds populating what is mostly distinct data at different 
times, hence the two indexes. IndexB is also used by other products which don’t 
need any data from indexA.

- can you manually shard both indices by those longValues?

I’m not sure what you mean by this, can you show an example?

- It seems like you query a plenty of data, don't you have another query/filter 
to intersect that join result with?

I hoped this was the answer too but sadly no, the only common field is this 
long.



Such a long time for "universe of 5 docs" seems really strange

Yes I would have thought the filter on indexB would speed things up, but no 
difference. The original join field was an alphanumeric, when we used that it 
took an extra 15 seconds to process.



We are running 4.10.3 can we do the {!join ... score=none} with that version?



Do you have a link to your talk at Berlin Buzzwords?



I’ve got the developer to describe what he’s trying to do, hopefully this will 
help in what we are trying to do.

#######################################################################################################

We have a massive index describing the contents of our client portfolios.  Each 
client portfolio can contain between 1 and 1 million securities.  We have 
approximately 8000 portfolios thus the index has approximately 250 million 
documents.  And, as would be expected, a particular security can be held in 
many client portfolios.  Each document contains the portfolio id and security 
id.



We have another large index holding security information, containing about 30 
million entries.  Each document has the security id.



We are trying, via Solr, to do the following query expressed, for convenience, 
in sql:



select * from 'security information' where portfolio id = 'X'.



Of course, this is a simple idea in sql, one simply joins the two indexes on 
security id.  However, when we perform a solr join our response time is 
approximately 50 seconds, and changing the 'start' position causes another 50 
second query (there seems to be no caching of the work performed in the initial 
query).



How can we speed this query up significantly?



How can we force Solr to cache the initial expensive query so that subsequent 
changes to the start parameter are fast?





Thanks





Russ.



-----Original Message-----
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
Sent: 08 September 2015 23:08
To: solr-user
Subject: Re: Solr Join between two indexes taking too long.



Hello Russ,



It's an interesting case! Can you get a brief context?

- is it possible to keep both type of data at the same core? Why not?

- can you manually shard both indices by those longValues?

- It seems like you query a plenty of data, don't you have another query/filter 
to intersect that join result with?



Such a long time for "universe of 5 docs" seems really strange. Can you open 
the index with Solr 5.3 and run the same query with number of result in 
universe:universeValue, but adding local param {!join ... score=none}?

that triggers alternative algorithm.



Also, profiler snapshots always help, you know. I've given a brief intro in 
join algorithms, and problems in Solr at recent Berlin Buzzwords, feel free to 
have a look if you are interested.



On Tue, Sep 8, 2015 at 3:09 PM, Russell Taylor < 
russell.tay...@interactivedata.com<mailto:russell.tay...@interactivedata.com>> 
wrote:



> Hi,

>  I hope somebody can help.

>

> We have two indexes, one which holds the descriptive data and the

> other one which holds lists of docs which are of a certain type

> (called universes in our world). They need to be joined together to

> show a list of data from indexA where a filtered indexB (by

> universe:value) has matching longs (The join field).

>

> At the moment the query is taking 55 seconds we need to get it under a

> second, any help most appreciated.

>

> INDEXES:

>

> Index a (primary index)

> 31 million docs with a converted alphanumeric to a long value with a

> possible 10 million unique values.

>

> Index B (the joined index)

> 250 million documents with a converted alphanumeric to a long value

> with a possible 10 million unique values.

> IndexB is filtered by universe which could be between 1 and 500,000 docs.

>

> QUERY:

>

> http://127.0.0.1:8080/solr/indexA/select?q={!join+from=longValue+to=lo<http://127.0.0.1:8080/solr/indexA/select?q=%7b!join+from=longValue+to=lo>

> ngValue+fromIndex=IndexB}universe

> :<

> http://127.0.0.1:8080/solr/indexA/select?q=%7b!join+from=longValue+to=

> longValue+fromIndex=IndexB%7duniverse

> :>universeValue

>

> Qtime is 55 seconds for either a universe of 5 docs or 500,000 docs.

>

>

>

> Thanks

>

>

> Russ.

>

>

> *******************************************************

> This message (including any files transmitted with it) may contain

> confidential and/or proprietary information, is the property of

> Interactive Data Corporation and/or its subsidiaries, and is directed

> only to the addressee(s). If you are not the designated recipient or

> have reason to believe you received this message in error, please

> delete this message from your system and notify the sender

> immediately. An unintended recipient's disclosure, copying,

> distribution, or use of this message or any attachments is prohibited and may 
> be unlawful.

> *******************************************************

>







--

Sincerely yours

Mikhail Khludnev

Principal Engineer,

Grid Dynamics



<http://www.griddynamics.com>

<mkhlud...@griddynamics.com<mailto:mkhlud...@griddynamics.com>>


*******************************************************
This message (including any files transmitted with it) may contain confidential 
and/or proprietary information, is the property of Interactive Data Corporation 
and/or its subsidiaries, and is directed only to the addressee(s). If you are 
not the designated recipient or have reason to believe you received this 
message in error, please delete this message from your system and notify the 
sender immediately. An unintended recipient's disclosure, copying, 
distribution, or use of this message or any attachments is prohibited and may 
be unlawful. 
*******************************************************

Reply via email to