Thanks @Joel Bernstein,
Actually i am using solrlcoud in 3 node/server and have created 3 core on
server 1,2,3 resp.
I need to implement join operation on these cores but join does not support
on solrcloud.
so i am thinking that Streaming Api could solved my problem.
--
View this message
Still i am struck ,how to solve my problem, search millions of ID with
minimum response time.
@Upayavira Please elaborate it.
Thanks
--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-search-miilions-of-record-in-solr-query-tp4248360p4248597.html
Sent from the Solr -
@Ere Maijala
>>question is: WHY do you need to search for millions of IDs?
I am explaining:
I have a list of ID's of 1 Millions
I will search in solr suppose like below
IP:8083/select?q=ID:(1,4,7,...upto 1 Millions)=10=0, then it
will display 10 result ,
for pagination next search will
Thanks David. It is quite good to use for NRT.
Apologies, I didn't mention that facet search is really slow.
I found the below reason which could be the reason because I am using facet
spatial search which is getting slow.
To know more about solr hard and soft commits, have a look at this blog
MRIT is not designed for that scenario, so you simply can't.
What people usually do is have a process whereby, after
the initial bulk load, there is some way their system-of-record
"knows" what new docs have been added since and
indexes only those. Flume is sometimes used if you have
access.
Hi Shawn,
Thanks for your reply.
I have uploaded the screenshot here
https://www.dropbox.com/s/l5itfbaus1c9793/Memmory%20Usage.png?dl=0
Basically, Java(TM) Platform SE Library, which Solr is running on, is only
using about 22GB currently. However, the memory usage at the top says it is
using
Thank you both, that's really helpful. Luwak and Percolator look like good
places to dig deeper.
Best wishes
Will
*Will Moy*
Director
020 3397 5140
*Full Fact*
fullfact.org
Twitter • Facebook
• LinkedIn
Thanks Erik and Binoy,
This is a case I stumbled upon: with queries like
q=*:*={!cache=false}n_rea:xxx={!cache=false}provincia:,fq={!cache=false}type:
where n_rea filter is highly selective
I was able to make > 3x performance improvement disabling cache
I think it's because the
What version of Solr? Prior to 5.2 the replicas were doing lots of
unnecessary work/being blocked, see:
https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/
Best,
Erick
On Tue, Jan 5, 2016 at 6:09 AM, Matteo Grolla wrote:
> Hi Luca,
>
It sounds like you're not doing proper autowarming,
which you'd need to do either with hard or
soft commits that open new searchers.
see: https://wiki.apache.org/solr/SolrCaching#Cache_Warming_and_Autowarming
In particular, you should have a newSearcher event
that facets on the fields you expect
Hi Toke,
I read the server's memory usage from the Task manager under Windows,
Regards,
Edwin
On 4 January 2016 at 17:17, Toke Eskildsen wrote:
> On Mon, 2016-01-04 at 10:05 +0800, Zheng Lin Edwin Yeo wrote:
> > A) Before I start the optimization, the server's
It doesn't sound like a very good match with Solr - or any other search
engine or any relational database or data store for that matter. Sure,
maybe you can get something to work with extraordinary effort, but it is
unlikely that you will ever be happy with the results. You should probably
just
If I'm correct, you are talking about this
*or may be here too.*
static firstSearcher warming in
solrconfig.xml
Thanks,
Novin
On Tue, 5 Jan
So still use Ere's suggestion. There's no reason at all to
search all million every time. If start=0, just search the first
N (say 1,000). Keep doing that until you don't get docs
then add more docs.
Or fire off the first query then, when you know there are
going to be pagination, fire off the
Assuming (and it wasn't clear from your problem statement) that you need
to search tokens in your field, this approach should be fine. I think Markus'
comment was assuming that you did _not_ need to search the
field. If you do, a copyField seems best.
Do be aware, though, that this will make for
What changes? You simply have "hot" and "cold" collections. When it comes time
to index data you:
1> create a collection
2> index to it.
3> use the Collections API to point your "active" collection to this new one
4> do whatever you want with the old one.
The setup is, of course, that your hot
Matteo:
Let's see if I understand your problem. Essentially you want
Solr to analyze the filter queries and decide through some
algorithm which ones to cache. I have a hard time thinking of
any general way to do this, certainly there's not hing in Solr
that does this automatically As Binoy
Might want to look into:
https://github.com/flaxsearch/luwak
or
https://github.com/OpenSextant/SolrTextTagger
-Original Message-
From: Will Moy [mailto:w...@fullfact.org]
Sent: Tuesday, January 05, 2016 11:02 AM
To: solr-user@lucene.apache.org
Subject: Many patterns against many
Hi Luca,
not sure if I understood well. Your question is
"Why are index times on a solr cloud collecton with 2 replicas higher than
on solr cloud with 1 replica" right?
Well with 2 replicas all docs have to be deparately indexed in 2 places and
solr has to confirm that both indexing went
Hi
I would like to maintain two cores for history data and current data where
hdfs is my datasource. My requirement is that data input should be given to
only one collection and previous data should be moved to history collection.
1)Creating two cores and migrating data from current to history
Hi,
after looking at the presentation of cloudsearch from lucene revolution
2014
https://www.youtube.com/watch?v=RI1x0d-yO8A=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP=49
min 17:08
I recognized I'd love to be able to remove the burden of disabling filter
query caching from developers
the problem:
Hello ,
I have a field which is defined to be a textField with PatternTokenizer
which splits on ";".
Now for one of the use case I need to use /export handler to export this
field. As /export handler needs field to support docValues , so if I try to
mark that field as docValues="true" it says
Hello - indeed, this is not going to work. But since you are using the token
filter as some preprocessor, you could easily use an update request processor
to do the preprocessing work for you. Check out the documentation, i think you
can use the RegexReplaceProcessor.
I concur with Erick and Upayavira that it is best to keep Tika in a separate
JVM...well, ideally a separate box or rack or even data center [0][1]. :)
But seriously, if you're using DIH/SolrCell, you have to configure Tika to
parse documents recursively. This was made possible in
Hello
Please may I have your advice as to whether Solr is a good tool for this
job?
We have (per year) –
Up to 50,000,000 sentences
And about 5,000 search patterns (i.e. queries)
Our task is to identify all matches between any sentence and any search
pattern.
That list of detections must be
If I understand your problem correctly, then you don't want the most
frequently used fqs removed and you do not want your filter cache to grow
to very large sizes.
Well there is already a solution for both of these.
In the solrconfig.xml file, you can configure the parameter
to suit your needs.
Hi Binoy,
I know these settings but the problem I'm trying to solve is when
these settings aren't enough.
2016-01-05 16:30 GMT+01:00 Binoy Dalal :
> If I understand your problem correctly, then you don't want the most
> frequently used fqs removed and you do not
What is your exact requirement then?
I ask, because these settings can solve the problems you've mentioned
without the need to add any additional functionality.
On Tue, Jan 5, 2016 at 9:04 PM Matteo Grolla
wrote:
> Hi Binoy,
> I know these settings but the problem
Well, if you already know that you need to display only the first 20
records, why not only search for them? Or if you don't know whether they
already exist, search for, say, a hundred, then thousand and so on until
you have enough.
Nevertheless, what's really needed for a good answer or ideas
Hi Erik,
the test was done on thousands of queries of that kind and milions of
docs
I went from <1500 qpm to ~ 6000 qpm on modest virtualized hardware (cpu
bound and cpu was scarce)
After that customer happy, time finished and didn't go further but
definitely cost was something I'd try
When I
Yep. Do note what's happening here. You're executing a query
that potentially takes 10 seconds to execute (based on your
earlier post). But you may be opening a new searcher every
2 seconds. You may start to see "too many on deck searchers"
in your log. If you do do _not_ try to "fix" this by
={!cache=false}n_rea:xxx={!cache=false}provincia:,fq={!cache=false}type:
You have a comma in front of the last fq clause, typo?
Well, the whole point of caching filter queries is so that the
_second_ time you use it,
very little work has to be done. That comes at a cost of course for
@Eric I might be wrong here so please correct me if I am.
In the particular case that Matteo has given applying the filters as post
won't make any difference since the query is going to return all docs
anyways. In such a case won't applying fqs normally be the same as applying
them as post
On 1/5/2016 9:59 AM, Zheng Lin Edwin Yeo wrote:
> I have uploaded the screenshot here
> https://www.dropbox.com/s/l5itfbaus1c9793/Memmory%20Usage.png?dl=0
>
> Basically, Java(TM) Platform SE Library, which Solr is running on, is only
> using about 22GB currently. However, the memory usage at the
Thanks for your reply @Ere Maijala,
one of my eCommerce based client have a requirement to search some of
records based on ID's like
IP:8083/select?q=ID:(1,4,7,...upto 1 Millions), display only 10 to 20
records.
if i use above procedure it takes too much time or if i am going to use
Thanks Joel Bernstein
could you share any of link please
--
View this message in context:
http://lucene.472066.n3.nabble.com/can-we-use-Streaming-Expressions-for-different-collection-tp4248461p4248794.html
Sent from the Solr - User mailing list archive at Nabble.com.
Thanks for your response Ahmet.
Best,
Modassar
On Mon, Jan 4, 2016 at 5:07 PM, Ahmet Arslan
wrote:
> Hi,
>
> I think wildcard queries fl:networ* are re-written into Constant Score
> Query.
> fl=*,score should returns same score for all documents that are retrieved.
>
Thanks Markus.
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-use-DocValues-with-TextField-tp4248647p4248797.html
Sent from the Solr - User mailing list archive at Nabble.com.
@Erick Erickson thanks for reply,
Actually they give me only this task to search 1 millions ID's with good
performance ,result should be appear within 50-100ms.
Yeah i will fire off the full query (up to millions) in the background, but
how what is the efficient way of doing it in term of
Thanks Erick.
Yes I was not clear in questioning but I want it to be searchable on
TextField.
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-use-DocValues-with-TextField-tp4248647p4248796.html
Sent from the Solr - User mailing list archive at Nabble.com.
Well, you're serving the first set of results very quickly because you're
only looking for, say, the first 1,000. Thereafter you assemble the rest
of the result set in the background (and I'd use the export function) to
have your app have the next N ready for immediate response to the
user.
But
Hi Shawn,
Here is the new screenshot of the Memory tab of the Resource Monitor.
https://www.dropbox.com/s/w4bnrb66r16lpx1/Resource%20Monitor.png?dl=0
Yes, I found that the value under the "Working Set" column is much higher
than the others. Also, the value which I was previously looking at under
Hi,
*q=fl1:net*=fl=50=true={!cardinality=1.0}fl*
is returning cardinality around 15 million. It is taking around 4 minutes.
Similar response time is seen with different queries which yields high
cardinality. Kindly note that the cardinality=1.0 is the desired goal.
Here in the above example the
You might get better answers if you'd describe your use-case. If, for
instance, you know all the IDs and you just need to be able to display a
hundred records among those millions quickly, it would make sense to
search for only a chunk of 100 IDs at a time. If you need to support
more search
There are a number of map/reduce join implementations available in Trunk.
These are map/reduce joins where the entire result sets are shuffled to
worker nodes. All of this code is in the org.apache.solr.client.solrj.io.stream
package if you'd like to review.
Joel Bernstein
Hi Joel,
Sorry there was an error between my chair and keyboard; there isn't a bug
- the right hand stream was not ordered by the joined-on field. So, the
following query does what I expected:
http://localhost:8983/solr/gettingstarted/stream?stream=innerJoin(search(gettingstarted
Akiel,
https://issues.apache.org/jira/browse/SOLR-7554 added checks on the sort
with streams, where required. If a particular stream requires that incoming
streams be ordered in a compatible way then that check will be performed
during creation of the stream and an error will be thrown if that
Binoy:
bq: In such a case won't applying fqs normally be the same as applying
them as post filters
Certainly not, at least AFAIK...
By definition, regular FQs are calculated over the entire corpus
(not, NOT just the docs that satisfy the query). Then that entire
bitset is stored in the
You could send the documents to both and filter out the recent ones in the
history collection.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Jan 5, 2016, at 5:46 AM, vidya wrote:
>
> Hi
>
> I would like to maintain two
I want to get the field size (in kb or mb) as is It is stored on disk. That
approach might not give that info.
On Monday, January 4, 2016, Upayavira wrote:
>
> Solr does store the term positions, but you won't find it easy to
> extract them, as they are stored against terms
The field is not stored in a discrete place, rather it is mixed up with
all other field/document data. Therefore, I would suggest that
attempting to discern the disk space consumed by a single field would be
a futile endeavour.
Upayavira
On Tue, Jan 5, 2016, at 12:04 PM, KNitin wrote:
> I want
Hi guys,
I'm having trouble to figure what would be idle solr config for where:
I'm doing hard commit in every minute for very few number of users
because I have to show those docs in search results quickly when user save
the changes.
It is causing the response in around 2 secs to show even
You should use solr softcommit for this use case. So, by setting softcommit
to 5 seconds and autoCommit to minute with openSearcher=false should do the
work.
6
false
2000
Reference link-
https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
To know more about
53 matches
Mail list logo