Hi,
When doing a distributed query from solr 4.10.4 ,getting below exception
org.apache.solr.common.SolrException: org.apache.http.ParseException: Invalid
content type:
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)
On 12/18/2014 12:35 AM, rashi gandhi wrote:
Also, as per our investigation currently there is work ongoing in SOLR
community to support this concept of distributed/Global IDF. But, I wanted
to know if there is any solution possible right now to manage/control the
score of the documents during
Hi,
This is regarding the issue that we are facing with SOLR distributed search.
In our application, we are managing multiple shards at SOLR server to
manage the load. But there is a problem with the order of results that we
going to return to client during the search.
For Example: Currently
Hi,
Does this *shards* parameter will also work in near future with solr 5?
With Regards
Aman Tandon
On Thu, Jun 5, 2014 at 2:59 PM, Mahmoud Almokadem prog.mahm...@gmail.com
wrote:
Hi, you can search using this sample Url
On 6/6/2014 6:25 AM, Aman Tandon wrote:
Does this *shards* parameter will also work in near future with solr 5?
I am not aware of any plan to deprecate or remove the shards parameter.
My personal experience is with versions from 1.4.0 through 4.7.2. It
works in all of those versions. Without
Thanks shawn.
In my organisation we also want to implement the solrcloud, but the problem
is that, we are using the master-slave architecture and on master we do all
indexing, architecture of master is lower than the slaves.
So if we implement the solrcloud in a fashion that master will be the
Thanks shawn.
In my organisation we also want to implement the solrcloud, but the problem
is that, we are using the master-slave architecture and on master we do all
indexing, architecture of master is lower than the slaves.
So if we implement the solrcloud in a fashion that master will be the
On 6/6/2014 8:31 AM, Aman Tandon wrote:
In my organisation we also want to implement the solrcloud, but the problem
is that, we are using the master-slave architecture and on master we do all
indexing, architecture of master is lower than the slaves.
So if we implement the solrcloud in a
Thanks shawn i will try to think in that way too :)
With Regards
Aman Tandon
On Fri, Jun 6, 2014 at 8:19 PM, Shawn Heisey s...@elyograg.org wrote:
On 6/6/2014 8:31 AM, Aman Tandon wrote:
In my organisation we also want to implement the solrcloud, but the
problem
is that, we are using the
Hi,
Can you please help me solr distribued search in multicore? i would
be very happy as i am stuck here.
In java code how do i implement distributed search?
--
Thanks Regards
Anurag Verma
Hi, you can search using this sample Url
http://localhost:8080/solr/core1/select?q=*:*shards=localhost:8080/solr/core1,localhost:8080/solr/core2,localhost:8080/solr/core3
Mahmoud Almokadem
On Thu, Jun 5, 2014 at 8:13 AM, Anurag Verma vermanur...@gmail.com wrote:
Hi,
Can you please
Here is an example of schema design: a PDF file of 5MB might have
maybe 50k of actual text. The Solr ExtractingRequestHandler will find
that text and only index that. If you set the field to stored=true,
the 5mb will be saved. If saved=false, the PDF is not saved. Instead,
you would store a link
This copying is a bit overstated here because of the way that small
segments are merged into larger segments. Those larger segments are then
copied much less often than the smaller ones.
While you can wind up with lots of copying in certain extreme cases, it is
quite rare. In particular, if you
For data of this size you may want to look at something like Apache
Cassandra, which is made specifically to handle data at this kind of
scale across many machines.
You can still use Hadoop to analyse and transform the data in a
performant manner, however it's probably best to do some research on
Hi,
I have a basic question, let's say we're going to have a very very huge set
of data.
In a way that for sure we will need many servers (tens or hundreds of
servers).
We will also need failover.
Now the question is, if we should use Hadoop or using Solr Distributed
Search
with shards would
to have a very very huge set
of data.
In a way that for sure we will need many servers (tens or hundreds of
servers).
We will also need failover.
Now the question is, if we should use Hadoop or using Solr Distributed
Search
with shards would be enough?
I've read lots of articles like:
http
Solr Distributed
Search
with shards would be enough?
I've read lots of articles like:
http://www.lucidimagination.com/content/scaling-lucene-and-solr
http://wiki.apache.org/solr/DistributedSearch
But I'm still confused, Solr's distributed search seems to be able to
handle
splitting
very huge
set
of data.
In a way that for sure we will need many servers (tens or hundreds of
servers).
We will also need failover.
Now the question is, if we should use Hadoop or using Solr Distributed
Search
with shards would be enough?
I've read lots of articles like
Interesting info.
You should look into using Solid State Drives. I moved my search engine to
SSD and saw dramatic improvements.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Huge-Performance-Solr-distributed-search-tp3530627p346.html
Sent from the Solr - User
Hi all again. Thanks to all for your replies.
On this weekend I'd made some interesting tests, and I would like to share it
with you.
First of all I made speed test of my hdd:
root@LSolr:~# hdparm -t /dev/sda9
/dev/sda9:
Timing buffered disk reads: 146 MB in 3.01 seconds = 48.54
Problem has been resolved. My disk subsystem been a bottleneck for quick search.
I put my indexes to RAM and I see very nice QTimes :)
Sorry for your time, guys.
On Mon, Nov 28, 2011 at 4:02 PM, Artem Lokotosh arco...@gmail.com wrote:
Hi all again. Thanks to all for your replies.
On this
45 000 000 per shard approx, Tomcat, caching was tweaked in solrconfig and
shard given 12GB of RAM max.
!-- Filter Cache
Cache used by SolrIndexSearcher for filters (DocSets),
unordered sets of *all* documents that match a query. When a
new searcher is opened, its
On 11/25/2011 3:13 AM, Mark Miller wrote:
When you search each shard, are you positive that you are using all of the
same parameters? You are sure you are hitting request handlers that are
configured exactly the same and sending exactly the same queries?
I'm my experience, the overhead for
in general terms, when your Java heap is so large, it is beneficial to
set mx and ms to the same size.
On Wed, Nov 23, 2011 at 5:12 AM, Artem Lokotosh arco...@gmail.com wrote:
Hi!
* Data:
- Solr 3.4;
- 30 shards ~ 13GB, 27-29M docs each shard.
* Machine parameters (Ubuntu 10.04 LTS):
Can you merge, e.g. 3 shards together or is it much effort for your
team?Yes, we can merge. We'll try to do this and review how it will works
Merge does not help :(I've tried to merge two shards in one, three
shards in one, but results are similar to results first configuration
with 30
How big are the documents you return (how many fields, avg KB per doc, etc.)?
I have a following schema in my solr configurationfieldsfield
name=field1 type=text indexed=true stored=false/field
name=field2 type=text indexed=true stored=true/field
name=field3 type=text indexed=true
On Thu, Nov 24, 2011 at 12:09 PM, Artem Lokotosh arco...@gmail.com wrote:
How big are the documents you return (how many fields, avg KB per doc,
etc.)?
I have a following schema in my solr configurationfieldsfield
name=field1 type=text indexed=true stored=false/field
name=field2 type=text
Hi!
* Data:
- Solr 3.4;
- 30 shards ~ 13GB, 27-29M docs each shard.
* Machine parameters (Ubuntu 10.04 LTS):
user@Solr:~$ uname -a
Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011
x86_64 GNU/Linux
user@Solr:~$ cat /proc/cpuinfo
processor : 0 - 3
vendor_id :
Hello,
Is this log from the frontend SOLR (aggregator) or from a shard?
Can you merge, e.g. 3 shards together or is it much effort for your team?
In our setup we currently have 16 shards with ~30GB each, but we rarely
search in all of them at once.
Best,
Dmitry
On Wed, Nov 23, 2011 at 3:12 PM,
Is this log from the frontend SOLR (aggregator) or from a shard?
from aggregator
Can you merge, e.g. 3 shards together or is it much effort for your team?
Yes, we can merge. We'll try to do this and review how it will works
Thanks, Dmitry
Any another ideas?
On Wed, Nov 23, 2011 at 4:01 PM,
If the response time from each shard shows decent figures, then aggregator
seems to be a bottleneck. Do you btw have a lot of concurrent users?
On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh arco...@gmail.com wrote:
Is this log from the frontend SOLR (aggregator) or from a shard?
from
If the response time from each shard shows decent figures, then aggregator
seems to be a bottleneck. Do you btw have a lot of concurrent users?For now
is not a problem, but we expect from 1K to 10K of concurrent users and maybe
more
On Wed, Nov 23, 2011 at 4:43 PM, Dmitry Kan
If you request 1000 docs from each shard, then aggregator is really
fetching 30,000 total documents, which then it must merge (re-sort
results, and take top 1000 to return to client). Its possible that
SOLR merging implementation needs optimized, but it does not seem like
it could be that slow.
Hi all,
Now I'm doing research on solr distributed search, and it is said documents
more than one million is reasonable to use distributed search.
So I want to know, does anyone have the test result(Such as time cost) of using
single index and distributed search of more than one million data? I
AM
To: solr-user@lucene.apache.org; d...@lucene.apache.org
Subject: About solr distributed search
Hi all,
Now I'm doing research on solr distributed search, and it is said documents
more than one million is reasonable to use distributed search.
So I want to know, does anyone have the test result
a
distributed productive system.
Kind Regards
Gregor
On 09/29/2011 12:14 PM, Pengkai Qin wrote:
Hi all,
Now I'm doing research on solr distributed search, and it is said
documents more than one million is reasonable to use distributed search.
So I want to know, does anyone have the test result
Hi all,
Now I'm doing research on solr distributed search, and it
is said documents more than one million is reasonable to use
distributed search.
So I want to know, does anyone have the test
result(Such as time cost) of using single index and distributed search
of more than one million data
hi
建议你自己搭个环境测试一下吧,1M这点儿数据一点儿问题没有
2011/9/30 秦鹏凯 qinpeng...@yahoo.cn:
Hi all,
Now I'm doing research on solr distributed search, and it
is said documents more than one million is reasonable to use
distributed search.
So I want to know, does anyone have the test
result(Such as time cost
requestHandler name=MYREQUESTHANDLER class=solr.SearchHandler
!-- default values for query parameters --
lst name=defaults
str name=echoParamsexplicit/str
str name=facet.methodenum/str
str name=facet.mincount1/str
str name=facet.limit10/str
str
hi all,
I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent
but there is something wrong.
the url given my the wiki is
Hi,
I do not use spell but I use distributed search, using qt=spell is correct,
should not use qt=\spell.
For shards, I specify it in solrconfig directly, not in url, but should
work the same.
Maybe an issue in your spell request handler.
2011/8/19 Li Li fancye...@gmail.com
hi all,
I
could you please show me your configuration in solrconfig.xml?
On Fri, Aug 19, 2011 at 5:31 PM, olivier sallou
olivier.sal...@gmail.com wrote:
Hi,
I do not use spell but I use distributed search, using qt=spell is correct,
should not use qt=\spell.
For shards, I specify it in solrconfig
On Tue, 2010-10-26 at 15:48 +0200, Ron Mayer wrote:
And a third potential reason - it's arguably a feature instead of a bug
for some applications. Depending on how I organize my shards, give me
the most relevant document from each shard for this search seems like
it could be useful.
You can
Andrzej Bialecki wrote:
On 2010-10-25 11:22, Toke Eskildsen wrote:
On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote:
But itshows a problem of distrubted search without common idf.
A doc will get different score in different shard.
Bingo.
I really don't understand why this fundamental problem
On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote:
But itshows a problem of distrubted search without common idf.
A doc will get different score in different shard.
Bingo.
I really don't understand why this fundamental problem with sharding
isn't mentioned more often. Every time the advice use
On 2010-10-25 11:22, Toke Eskildsen wrote:
On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote:
But itshows a problem of distrubted search without common idf.
A doc will get different score in different shard.
Bingo.
I really don't understand why this fundamental problem with sharding
isn't
On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote:
* there is an exact solution to this problem, namely to make two
distributed calls instead of one (first call to collect per-shard IDFs
for given query terms, second call to submit a query rewritten with the
global IDF-s). This
On 2010-10-25 13:37, Toke Eskildsen wrote:
On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote:
* there is an exact solution to this problem, namely to make two
distributed calls instead of one (first call to collect per-shard IDFs
for given query terms, second call to submit a query
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p995407.html
Sent from the Solr - User mailing list archive at Nabble.com.
where is the link of this patch?
2010/7/24 Yonik Seeley yo...@lucidimagination.com:
On Fri, Jul 23, 2010 at 2:23 PM, MitchK mitc...@web.de wrote:
why do we do not send the output of TermsComponent of every node in the
cluster to a Hadoop instance?
Since TermsComponent does the map-part of the
the solr version I used is 1.4
2010/7/26 Li Li fancye...@gmail.com:
where is the link of this patch?
2010/7/24 Yonik Seeley yo...@lucidimagination.com:
On Fri, Jul 23, 2010 at 2:23 PM, MitchK mitc...@web.de wrote:
why do we do not send the output of TermsComponent of every node in the
distributed IDF
(like at the mentioned JIRA-issue) to normalize your results's scoring.
But the mentioned problem at this mailing-list-posting has nothing to do
with that...
Regards
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search
this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990506.html
Sent from the Solr - User mailing list archive at Nabble.com.
On Fri, Jul 23, 2010 at 2:23 PM, MitchK mitc...@web.de wrote:
why do we do not send the output of TermsComponent of every node in the
cluster to a Hadoop instance?
Since TermsComponent does the map-part of the map-reduce concept, Hadoop
only needs to reduce the stuff. Maybe we even do not need
other suggestions?
-Yonik
http://www.lucidimagination.com
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990551.html
Sent from the Solr - User mailing list archive at Nabble.com.
That only works if the docs are exactly the same - they may not be.
Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same,
don't they?
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990563.html
Sent from
On Fri, Jul 23, 2010 at 2:40 PM, MitchK mitc...@web.de wrote:
That only works if the docs are exactly the same - they may not be.
Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same,
don't they?
Documents aren't supposed to be duplicated across shards... so the
presence
As the comments suggest, it's not a bug, but just the best we can do
for now since our priority queues don't support removal of arbitrary
elements. I guess we could rebuild the current priority queue if we
detect a duplicate, but that will have an obvious performance impact.
Any other
: As the comments suggest, it's not a bug, but just the best we can do
: for now since our priority queues don't support removal of arbitrary
FYI: I updated the DistributedSearch wiki to be more clear about this --
it previously didn't make it explicitly clear that docIds were suppose to
be
in QueryComponent.mergeIds. It will remove document which has
duplicated uniqueKey with others. In current implementation, it use
the first encountered.
String prevShard = uniqueDoc.put(id, srsp.getShard());
if (prevShard != null) {
// duplicate detected
Li Li,
this is the intended behaviour, not a bug.
Otherwise you could get back the same record in a response for several
times, which may not be intended by the user.
Kind regards,
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search
not be intended by the user.
Kind regards,
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983675.html
Sent from the Solr - User mailing list archive at Nabble.com.
regards,
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983771.html
Sent from the Solr - User mailing list archive at Nabble.com.
you can't prevent this without custom coding or
making a document's occurence unique.
Kind regards,
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983771.html
Sent from the Solr - User mailing list archive at Nabble.com.
://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983880.html
Sent from the Solr - User mailing list archive at Nabble.com.
How about sorting over the score? Would that be possible?
On Jul 21, 2010, at 12:13 AM, Li Li wrote:
in QueryComponent.mergeIds. It will remove document which has
duplicated uniqueKey with others. In current implementation, it use
the first encountered.
String prevShard =
sees
the doc_X firstly at shard_A and ignores it at shard_B. That means, that the
doc maybe would occur at page 10 in pagination, although it *should* occur
at page 1 or 2.
Kind regards,
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search
firstly at shard_A and ignores it at shard_B. That means, that the
doc maybe would occur at page 10 in pagination, although it *should* occur
at page 1 or 2.
Kind regards,
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search
);
response = solr.query(query);
SolrDocumentList docs = response.getResults();
long docNum = docs.getNumFound();
--
View this message in context:
http://www.nabble.com/Solr-Distributed-Search-throws-org.apache.solr.common.SolrException%3A-Form_too_large-Exception-tp24295114p24295114
Hi Mark,
i actually got this error coz i was using an old version of
java. now the problem is solved
Thanks anyways
Raakhi
On Tue, Jun 9, 2009 at 11:17 AM, Rakhi Khatwani rkhatw...@gmail.com wrote:
Hi Mark,
yea i would like to open a JIRA issue for it. how do i go
Thanks for bringing closure to this Raakhi.
- Mark
Rakhi Khatwani wrote:
Hi Mark,
i actually got this error coz i was using an old version of
java. now the problem is solved
Thanks anyways
Raakhi
On Tue, Jun 9, 2009 at 11:17 AM, Rakhi Khatwani rkhatw...@gmail.com wrote:
Hi,
I was executing a simple example which demonstrates DistributedSearch.
example provided in the following link:
http://wiki.apache.org/solr/DistributedSearch
however, when i startup the server in both port nos: 8983 and 7574, i get
the following exception:
SEVERE: Could not start
Hi Mark,
yea i would like to open a JIRA issue for it. how do i go about
that?
Regards,
Raakhi
On Mon, Jun 8, 2009 at 7:58 PM, Mark Miller markrmil...@gmail.com wrote:
That is a very odd cast exception to get. Do you want to open a JIRA issue
for this?
It looks like an odd
On Fri, Apr 10, 2009 at 7:50 AM, vivek sar vivex...@gmail.com wrote:
Just an update. I changed the schema to store the unique id field, but
I still get the connection reset exception. I did notice that if there
is no data in the core then it returns the 0 result (no exception),
but if there
yes - it's all new indexes. I can search them individually, but adding
shards throws Connection Reset error. Is there any way I can debug
this or any other pointers?
-vivek
On Fri, Apr 10, 2009 at 4:49 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
On Fri, Apr 10, 2009 at 7:50 AM,
Hi,
I've another thread on multi-core distributed search, but just
wanted to put a simple question here on distributed search to get some
response. I've a search query,
http://etsx19.co.com:8080/solr/20090409_9/select?q=usa -
returns with 10 result
now if I add shards parameter to it,
I think the reason behind the connection reset is. Looking at the
code it points to QueryComponent.mergeIds()
resultIds.put(shardDoc.id.toString(), shardDoc);
looks like the doc unique id is returning null. I'm not sure how is it
possible as its a required field. Right my unique id is not stored
Just an update. I changed the schema to store the unique id field, but
I still get the connection reset exception. I did notice that if there
is no data in the core then it returns the 0 result (no exception),
but if there is data and you search using shards parameter I get the
connection reset
78 matches
Mail list logo