On Thu, 2010-09-02 at 03:37 +0200, Lance Norskog wrote:
I don't know how much SSD disks cost, but they will certainly cure the
disk i/o problem.
We've done a fair amount of experimentation in this area (1997-era SSDs
vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in
RAID
On Fri, 2010-09-03 at 03:45 +0200, Shawn Heisey wrote:
On 9/2/2010 2:54 AM, Toke Eskildsen wrote:
We've done a fair amount of experimentation in this area (1997-era SSDs
vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in
RAID 0). The harddisk setups never stood a chance
On Fri, 2010-09-03 at 11:07 +0200, Dennis Gearon wrote:
If you really want to see performance, try external DRAM disks.
Whew! 800X faster than a disk.
As sexy as they are, the DRAM drives does not buy much more extra
performance. At least not at the search stage. For searching, SSDs are
not
From: Dennis Gearon [gear...@sbcglobal.net]:
I wouldn't have thought that CPU was a big deal with the speed/cores of CPU's
continuously growing according to Moore's law and the change in Disk Speed
barely changine 50% in 15 years. Must have a lot to do with caching.
I am not sure I follow you?
enough to get a test-machine with 2 types
of SSD, 2 10,000 RPM harddisks and 2 15,000 RPM harddisks. Some quick
notes can be found at http://wiki.statsbiblioteket.dk/summa/Hardware
The world has moved on since then, but that has only widened the gap
between SSDs and harddisks.
Regards,
Toke
On Fri, 2012-11-16 at 02:18 +0100, Buttler, David wrote:
Obviously, I could replicate the data so
that I wouldn't lose any documents while I replace my disk, but since I
am already storing the original data in HDFS, (with a 3x replication),
adding additional replication for solr eats into my
On Mon, 2012-11-19 at 08:10 +0100, Bernd Fehling wrote:
I think there is already a BETA available:
http://luke.googlecode.com/svn/trunk/
You might try that one.
That doesn't work either for Lucene 4.0.0 indexes, same for source
trunk. I did have some luck with downloading the source and
could reduce memory consumption to 1/10 of the worst
case 7GB, if the values are fairly uniform. Of course, if the values are
all over the place, this gains you nothing at all.
Regards,
Toke Eskildsen
/solr/HierarchicalFaceting
Regards,
Toke Eskildsen
,
Toke Eskildsen, State and University Library, Denmark
expect to handle,
what do you expect a query to look like, how should the result be presented?
Regards,
Toke Eskildsen
to the documents they belong to. The penalty for having
thousands or millions of terms as compared to tens or hundreds in a field in an
inverted index is very small.
We're still in any random machine you've got available-land so I second
Michael's suggestion.
Regards,
Toke Eskildsen
Regards,
Toke Eskildsen
On Tue, 2013-02-19 at 18:39 +0100, chamara wrote:
Hi Thanks Shawn for the Input, Yes i am using SolrCloud to replicate the
index to another server running with the same spec with 32cores and 72GB RAM
on each machine. I have to test the performance of RAID 10? Have you ever
done a deployment
On Wed, 2013-02-20 at 10:06 +0100, Erik Dybdahl wrote:
However, after definining
field name=customerField_* type=string indexed=true
stored=true multiValued=true/
Seems like a typo to me: You need to write dynamicField, not
field, when defining a dynamic field.
Regards,
Toke Eskildsen
On Fri, 2011-09-09 at 18:48 +0200, Mike Austin wrote:
Our index is very small with 100k documents and a light load at the moment.
If I wanted to use the smallest possible RAM on the server, how would I do
this and what are the issues?
The index size depends just as much on the size of the
On Thu, 2011-09-15 at 22:54 +0200, Pulkit Singhal wrote:
Has anyone ever had to create large mock/dummy datasets for test
environments or for POCs/Demos to convince folks that Solr was the
wave of the future?
Yes, but I did it badly. The problem is that real data are not random so
any simple
On Sun, 2011-09-25 at 22:00 +0200, Ikhsvaku S wrote:
Documents: We have close to ~12 million XML docs, of varying sizes average
size 20 KB. These documents have 150 fields, which should be searchable
indexed. [...] Approximately ~6000 such documents are updated 400-800 new
ones
are added
On Tue, 2011-09-27 at 02:43 +0200, Bictor Man wrote:
thanks for your replies. indeed the filesystem caching seems to be the
difference. sadly I can't add more memory and the 6GB/20core combination
doesn't work. so I'll just try to tweak it as much as I can.
A (better) alternative to more
On Wed, 2011-09-28 at 12:58 +0200, Frederik Kraus wrote:
- 10 shards per server (needed for response times) running in a single tomcat
instance
Have you tested that sharding actually decreases response times in your
case? I see the idea in decreasing response times with sharding at the
cost of
sharding is
given, it should be followed with a but be aware that it will make
relevance ranking unreliable.
Regards,
Toke Eskildsen
, logical grouping and distributed IDF?
Regards,
Toke Eskildsen
On Mon, 2010-10-25 at 09:41 +0200, Mathias Walter wrote:
[...] I enabled the field cache for my ID field and another
single char field (PAS type) to get the benefit of accessing
the fields with an array. Unfortunately, the IDs are too
large to fit in memory. I gave 12 GB of RAM to each node
.
The problem is of course to judge the quality of the outputs, but
setting the single index as the norm and plotting the differences in
document positions in the result sets might provide some insight.
Regards,
Toke Eskildsen
working idea. Maybe Varun
could comment on the maximum numbers of terms that his queries will
contain?
Regards,
Toke Eskildsen
On Wed, 2010-10-27 at 15:02 +0200, Mike Sokolov wrote:
Right - my point was to combine this with the previous approaches to
form a query like:
samsung AND android
On Wed, 2010-10-27 at 14:20 +0200, mike anderson wrote:
[...] By my simple math, this would mean that if we want each shard's
index to be able to fit in memory, [...]
Might I ask why you're planning on using memory-based sharding? The
performance gap between memory and SSDs is not very big so
Jonathan Rochkind [rochk...@jhu.edu] wrote:
I too sometimes have similar use cases, and my best ideas about how to
solve them involve using faceting --- you can facet on a multi-valued
field, and you can sort facets--but you can only sort facets by index
order, a strict byte-by-byte sort.
.
Regards,
Toke Eskildsen
On Fri, 2010-10-29 at 10:18 +0200, RL wrote:
Executing a query and sorting by this field leads to unnatural sorting of :
string1
string10
string2
That's very much natural. Numbers are not treated any different from
words made up of letters. Your have to use alignment if you want to use
On Fri, 2010-10-29 at 10:06 +0200, Mark Allan wrote:
For me, I simply deleted the original email, but I'm now quite
enjoying the irony of the complaints causing more noise on the list
than the original email! ;-)
He he. An old classic. Next in line is the meta-meta-discussion about
Lance Norskog [goks...@gmail.com] wrote:
It would be handy to have an auto-incrementing date field, so that
each document would get a unique number and the timestamp would then
be the unique ID of the document.
If someone want to implement this, I'll just note that the granilarity of Solr
Dennis Gearon [gear...@sbcglobal.net] wrote:
Even microseconds may not be enough on some really good, fast machine.
True, especially since the timer might not provide microsecond granularity
although the returned value is in microseconds. However, an unique timestamp
generator should keep
Dennis Gearon [gear...@sbcglobal.net] wrote:
how about a timrstamp with either a GUID appended on the end of it?
Since long (8 bytes) is the largest atomic type supported by Java, this would
have to be represented as a String (or rather BytesRef) and would take up 4 +
32 bytes + 2 * 4 bytes
On Mon, 2010-11-15 at 06:35 +0100, lu.rongbin wrote:
In addition,my index has only two store fields, id and price, and other
fields are index. I increase the document and query cache. the ec2
m2.4xLarge instance is 8 cores, 68G memery. all indexs size is about 100G.
Looking at
On Tue, 2010-12-14 at 06:07 +0100, Cameron Hurst wrote:
[Cameron expected 150MB overhead]
As I start to index data and passing queries to the database I notice a
steady rise in the RAM but it doesn't stop at 150MB. If I continue to
reindex the exact same data set with no additional data
Stijn Vanhoorelbeke [stijn.vanhoorelb...@gmail.com] wrote:
I want to do a quickdirt load testing - but all my results are cached.
I commented out all the Solr caches - but still everything is cached.
* Can the caching come from the 'Field Collapsing Cache'.
-- although I don't see this
On Sat, 2011-01-01 at 03:06 +0100, Tri Nguyen wrote:
I remember going through some page that had graphs of response times based on
index size for solr.
Anyone know of such pages?
Sorry, no. Some small scale tests with our corpus showed that response
times suffered less than proportionally
is the same regardless of the number of drives.
If your current response time for a single user is satisfactory, adding
drives is a viable solution for you. I'll still recommend the SSD option
though, as it will also lower the response time for a single query.
Regards,
Toke Eskildsen
On Mon, 2011-01-10 at 21:43 +0100, Paul wrote:
I see from your other messages that these indexes all live on the same
machine.
You're almost certainly I/O bound, because you don't have enough memory for
the
OS to cache your index files. With 100GB of total index size, you'll get
On Fri, 2011-01-14 at 13:05 +0100, Cathy Hemsley wrote:
I hope you can help. We are migrating our intranet web site management
system to Windows 2008 and need a replacement for Index Server to do the
text searching. I am trying to establish if Lucene and Solr is a feasible
replacement, but I
[] ASF Mirrors (linked in our release announcements or via the Lucene website)
[X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[X] I/we build them from source via an SVN/Git checkout.
[X] Other (someone in your company mirrors them internally or via a
downstream project)
On Wed, 2011-01-19 at 08:15 +0100, Dennis Gearon wrote:
I was wondering if the are binary operation filters? Haven't seen any in the
book nor was able to find any using google.
So if I had 0600(octal) in a permission field, and I wanted to return any
records that 'permission
On Tue, 2011-01-11 at 12:12 +0100, Julien Piquot wrote:
I would like to be able to prune my search result by removing the less
relevant documents. I'm thinking about using the search score : I use
the search scores of the document set (I assume there are sorted by
descending order),
On Tue, 2011-01-25 at 10:20 +0100, Salman Akram wrote:
Cache warming is a good option too but the index get updated every hour so
not sure how much would that help.
What is the time difference between queries with a warmed index and a
cold one? If the warmed index performs satisfactory, then
On Wed, 2011-03-16 at 13:05 +0100, Dmitry Kan wrote:
Hello guys. We are using shard'ed solr 1.4 for heavy faceted search over the
trigrams field with about 1 million of entries in the result set and more
than 100 million of entries to facet on in the index. Currently the faceted
search is very
On Wed, 2011-03-16 at 18:36 +0100, Erik Hatcher wrote:
Sorry, I missed the original mail on this thread
I put together that hierarchical faceting wiki page a couple
of years ago when helping a customer evaluate SOLR-64 vs.
SOLR-792 vs.other approaches. Since then, SOLR-792 morphed
and
On Thu, 2011-06-16 at 12:39 +0200, Tommaso Teofili wrote:
Do you know if it is possible to show the facets for a particular field
related only to the first N docs of the total number of results?
It collides with the inner working in Solr, as faceting does not process
the doc-IDs from the
On Wed, 2011-06-29 at 09:35 +0200, eks dev wrote:
In MMAP, you need to have really smart warm up (MMAP) to beat IO
quirks, for RAMDir you need to tune gc(), choose your poison :)
Other alternatives are operating system RAM disks (avoids the GC
problem) and using SSDs (nearly the same
On Thu, 2011-06-30 at 11:38 +0200, Russell B wrote:
a multivalued field labelled category which for each document defines
where in the tree it should appear. For example: doc1 has the
category field set to 0/topics, 1/topics/computing,
2/topic/computing/systems.
I then facet on the
On Mon, 2011-07-04 at 13:51 +0200, Jame Vaalet wrote:
What would be the maximum size of a single SOLR index file for resulting in
optimum search time ?
There is no clear answer. It depends on the number of (unique) terms,
number of documents, bytes on storage, storage speed, query complexity,
On Fri, 2011-07-08 at 07:12 +0200, Nikhil Chhaochharia wrote:
However, if I upgrade to Solr 3.3, then the Virtual Memory of the Tomcat
process increases to roughly the index size (70GB). Any ideas why
this is happening?
Maybe you switched to MMapDirectory?
works for text fields.
- Toke Eskildsen, State and University Library, Denmark
to
implement and nearly all of your work on this will be usable for a
RAM-based solution, if you are not satisfied with the speed. Or you
could buy a small cheap SSD and have no more worries...
Regards,
Toke Eskildsen
On Mon, 2012-06-18 at 11:45 +0200, ramzesua wrote:
Hi all. I am using solr 4.0 and trying to clear index by query. At first I
use deletequery*:*/query/delete with commit, but index is still not
empty. I tried another queries, but it not help me. Then I tried delete by
`id`. It works fine, but
that allows for custom
ordering, but it sorts upon index open and thus has a fairly long start
up time. Besides, it it not in a proper state for production:
https://issues.apache.org/jira/browse/SOLR-2412
- Toke Eskildsen, State and University Library, Denmark
seconds without the server straining.
- Toke Eskildsen
?
What you're looking for is probably uniqueKey:
https://wiki.apache.org/solr/UniqueKey
- Toke Eskildsen
that
they are talking about 10 million documents and 10,000 updates. That
quite far from what you've got.
- Toke Eskildsen
On Fri, 2012-08-10 at 10:07 +0200, Lochschmied, Alexander wrote:
Coming from a SQL database based search system, we already have a set of
defined patterns associated with our searchable documents.
% matches no or any number of characters
_ matches one character
Example:
Doc 1: 'AB%CD',
On Mon, 2012-08-27 at 14:29 +0200, dhaivat dave wrote:
I am getting an error while indexing data to solr. i am using solrj apis to
index the document and using the xml request handler to index document. i
am getting an error *org.apache.solr.common.SolrException: Unexpected
character 'F' (code
indexes are controlled
by different parties, where the parties does want to collaborate on the
distribution part but does not want to have their data indexed by the
other parties. We currently have this challenge.
Regards,
Toke Eskildsen
On Fri, 2012-08-31 at 13:35 +0200, Erick Erickson wrote:
Imagine you have two entries, aardvark and emu in your
multiValued field. How should that document sort relative to
another doc with camel and zebra? Any heuristic
you apply will be wrong for someone else
I see two obvious choices
and that choosing by setup would
require the user to have a fairly deep understanding.
I accept that there is no clear need for the functionality at this point
in time and defer hacking on it.
Thank you for your input,
Toke Eskildsen
On Tue, 2012-09-11 at 08:00 +0200, Amey Patil wrote:
Our solr index (Solr 3.4) has over 100 million docuemnts.
[...]
*((keyword1 AND keyword2...) OR (keyword3 AND keyword4...) OR ...) AND
date:[date1 TO *]*
No. of keywords can be in the range of 100 - 1000.
We are adding sort parameter *'date
On Mon, 2012-09-10 at 16:04 +0200, Claudio Ranieri wrote:
When I used the CollationKeyFilterFactory in my facet (example below),
the value of facet went wrong. When I remove the
CollationKeyFilterFactory of type of facet, the value went correct.
As Ahmed wrote, CollationKeyFilter is meant for
On Tue, 2012-09-11 at 12:14 +0200, Claudio Ranieri wrote:
This is an interesting feature to be implemented, because we can sort
the results correctly, but not in the facets.
At work (State and University Library, Denmark) we use collator-ordered
faceting for author title, but out current
.
Regards,
Toke Eskildsen
On Tue, 2012-09-25 at 01:50 +0200, balaji.gandhi wrote:
I am encountering this error randomly (under load) when posting to Solr
using SolrJ.
Has anyone encountered a similar error?
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at:
that is an issue or not depends on the content.
e.g. for email archives, the single index will not work very well.
- Toke Eskildsen, State and University Library, Denmark
On Tue, 2012-09-25 at 04:21 +0200, 韦震宇 wrote:
The company I'm working in have a website to server more than 10
customers, and every customer should have it's own search cataegory.
So I should create independent index for every customer.
How many of the customers are active at any given
On Thu, 2012-09-27 at 13:49 +0200, aniljayanti wrote:
But getting error with below.
q=Oot \ Aboot
Error message :
--
message org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \':
Lexical error at line 1, column 6. Encountered: EOF after :
It seems like you are
On Fri, 2012-09-28 at 14:43 +0200, Claudio Ranieri wrote:
name | city
Jose | Campinas
Jose | São Paulo
Jose | Rio de Janeiro
Jose | Rio Branco
Jose | Ourinhos
In search by Jose, I wish return on top the documents (Jose | São
Paulo and Jose | Rio de Janeiro).
If all documents has a city
On Mon, 2012-10-01 at 14:20 +0200, Claudio Ranieri wrote:
Is there a way to omit the cities with boosting 1?
The number of cities is big, but the number of important cities is small.
Sorry, not with this simple trick.
Maybe a function query, as 曹霖 suggests, can help you, but I have no
the potential
combinatorial explosion of your primary secondary values.
So that leaves the question: How many distinct combinations of primary
and secondary values do you have?
Regards,
Toke Eskildsen
On Mon, 2012-10-08 at 13:08 +0200, Sujatha Arun wrote:
I am unable to unzip the 5883_Code.zip file for solr 1.4 from paktpub site
.I get the error message
End-of-central-directory signature not found. [...]
It is a corrupt ZIP-file. I'm guessing you got it from
On Wed, 2012-10-10 at 14:15 +0200, Kissue Kissue wrote:
I have added the string: *-BAAN-* to the index to a field called pattern
which is a string type. Now i want to be able to search for A100-BAAN-C20
or ZA20-BAAN-300 and have Solr return *-BAAN-*.
That sounds a lot like the problem
in
development hours. I would suggest hacking the current faceting code to
use OpenBitSet instead of int[] and doing performance tests on that.
PerSegmentSingleValuedFaceting.SegFacet and UnivertedField.getCounts
seems to be the right places to look in Solr 4.
Regards,
Toke Eskildsen, State and University
On Thu, 2013-03-14 at 13:10 +0100, Arkadi Colson wrote:
When I shutdown tomcat free -m and top keeps telling me the same values.
Almost no free memory...
Any idea?
Are you reading top free right? It is standard behaviour for most
modern operating systems to have very little free memory. As
not need to facet on all fields all the time.
If you do need to facet on all fields on each call, you will need to
scale to many machines to get proper performance and the merging
overhead will likely be huge.
Regards,
Toke Eskildsen
Toke Eskildsen [t...@statsbiblioteket.dk] wrote:
[Solr, 11M documents, 5000 facet fields, 12GB RAM, OOM]
5000 fields @ 9 MByte is about 45GB for faceting.
If you are feeling really adventurous, take a look at
https://issues.apache.org/jira/browse/SOLR-2412
I tried building a test-index
with SSD. It has 16GB of RAM and runs two search instances, each with
~11M documents, one with a 52GB index, one with 71GB.
- Toke Eskildsen
minutes), we will have to
look into this.
On that note, Lucene's faceting with a central repository for the facet
terms looks very interesting as it opens up for both fast startup and
fast queries.
Regards,
Toke Eskildsen
to the FieldCache. [...]
I haven't used it yet, but DocValues in Solr 4.2 seems to be the answer.
- Toke Eskildsen
attribute for StrField,
UUIDField and all Trie*Fields, but I do not know if they are used
automatically by sort or if they should be requested explicitly.
Regards,
Toke Eskildsen
you hit OOM, changing to 3GB seems like a better choice than
4GB to me. Especially since you describe the allocation up to 3GB as gradual,
which tells me that your installation is not starved with 3GB.
- Toke Eskildsen
requirement, it is far from the 25GB that you are
allocating. Either you have an interestingly high number somewhere in the
equation or something's off.
Regards,
Toke Eskildsen
Toke Eskildsen [t...@statsbiblioteket.dk]:
If your whole index has 10M documents, which each has 100 values
for each field, with each field having 50M unique values, then the
memory requirement would be more than
10M*log2(100*10M) + 100*10M*log2(50M) bit ~= 340MB/field ~=
1.6GB for faceting
. Facets and sorting are often
memory hungry, but your system seems to have 13GB free RAM so the easy
solution attempt would be to increase the heap until Solr serves the
facets without OOM.
- Toke Eskildsen, State and University Library, Denmark
the faceting can be prepared?
https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F
Regards,
Toke Eskildsen
list is about
these topics.
How often do you commit and how many unique values does your facet
fields have?
Regards,
Toke Eskildsen
On Tue, 2013-04-02 at 17:08 +0200, Dotan Cohen wrote:
Most of the time I facet on one field that has about twenty unique
values.
They are likely to be disk cached so warming those for 9M documents
should only take a few seconds.
However, once per day I would like to facet on the text field,
with productZ,
version 85
compatible_engine:productZ* to get all products compatible with any version of
productZ.
- Toke Eskildsen
On Tue, 2013-04-09 at 08:40 +0200, It-forum wrote:
Le 08/04/2013 20:02, Toke Eskildsen a écrit :
compatible_engine:productZ/85 to get all products compatible with productZ,
version 85
compatible_engine:productZ* to get all products compatible with any version
of productZ.
Whoops, slash
to pinpoint the memory eater in your
setup?
- Toke Eskildsen
(of which the majority will be null) will be
#clients*#documents*#facet_fields
This means that the adding a new client will be progressively more
expensive.
On the other hand, if you use a lot of small shards, DocValues should
work for you.
Regards,
Toke Eskildsen
prices of SSDs I would really advice that you choose that road
instead.
Regards,
Toke Eskildsen, State and University Library, Denmark
warmups still running when new
commits are triggered.
Regards,
Toke Eskildsen, State and University Library, Denmark
JDK (look somewhere in the bin
folder), is your friend. Just start it on the server and click on the relevant
process.
Regards,
Toke Eskildsen
Whopps. I made some mistakes in the previous post.
Toke Eskildsen [t...@statsbiblioteket.dk]:
Extrapolating from 1.4M documents and 180 clients, let's say that
there are 1.4M/180/5 unique terms for each sort-field and that their
average length is 10. We thus have
1.4M*log2(1500*10*8) + 1500
On Wed, 2013-04-17 at 20:06 +0200, Mingfeng Yang wrote:
I am doing faceting on an index of 120M documents,
on the field of url[...]
I would guess that you would need 3-4GB for that.
How much memory do you allocate to Solr?
- Toke Eskildsen
1 - 100 of 586 matches
Mail list logo