Re: Hardware Specs Question

2010-09-02 Thread Toke Eskildsen
On Thu, 2010-09-02 at 03:37 +0200, Lance Norskog wrote: I don't know how much SSD disks cost, but they will certainly cure the disk i/o problem. We've done a fair amount of experimentation in this area (1997-era SSDs vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in RAID

Re: Hardware Specs Question

2010-09-03 Thread Toke Eskildsen
On Fri, 2010-09-03 at 03:45 +0200, Shawn Heisey wrote: On 9/2/2010 2:54 AM, Toke Eskildsen wrote: We've done a fair amount of experimentation in this area (1997-era SSDs vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in RAID 0). The harddisk setups never stood a chance

Re: Hardware Specs Question

2010-09-03 Thread Toke Eskildsen
On Fri, 2010-09-03 at 11:07 +0200, Dennis Gearon wrote: If you really want to see performance, try external DRAM disks. Whew! 800X faster than a disk. As sexy as they are, the DRAM drives does not buy much more extra performance. At least not at the search stage. For searching, SSDs are not

RE: Hardware Specs Question

2010-09-06 Thread Toke Eskildsen
From: Dennis Gearon [gear...@sbcglobal.net]: I wouldn't have thought that CPU was a big deal with the speed/cores of CPU's continuously growing according to Moore's law and the change in Disk Speed barely changine 50% in 15 years. Must have a lot to do with caching. I am not sure I follow you?

Re: persistent cache

2010-02-17 Thread Toke Eskildsen
enough to get a test-machine with 2 types of SSD, 2 10,000 RPM harddisks and 2 15,000 RPM harddisks. Some quick notes can be found at http://wiki.statsbiblioteket.dk/summa/Hardware The world has moved on since then, but that has only widened the gap between SSDs and harddisks. Regards, Toke

RE: cores shards and disks in SolrCloud

2012-11-16 Thread Toke Eskildsen
On Fri, 2012-11-16 at 02:18 +0100, Buttler, David wrote: Obviously, I could replicate the data so that I wouldn't lose any documents while I replace my disk, but since I am already storing the original data in HDFS, (with a 3x replication), adding additional replication for solr eats into my

Re: error opening index solr 4.0 with lukeall-4.0.0-ALPHA.jar

2012-11-18 Thread Toke Eskildsen
On Mon, 2012-11-19 at 08:10 +0100, Bernd Fehling wrote: I think there is already a BETA available: http://luke.googlecode.com/svn/trunk/ You might try that one. That doesn't work either for Lucene 4.0.0 indexes, same for source trunk. I did have some luck with downloading the source and

Re: Help with sort on dynamic field and out of memory error

2012-11-28 Thread Toke Eskildsen
could reduce memory consumption to 1/10 of the worst case 7GB, if the values are fairly uniform. Of course, if the values are all over the place, this gains you nothing at all. Regards, Toke Eskildsen

Re: Stored hierachical data in Solr

2013-01-16 Thread Toke Eskildsen
/solr/HierarchicalFaceting Regards, Toke Eskildsen

RE: long QTime for big index

2013-01-31 Thread Toke Eskildsen
, Toke Eskildsen, State and University Library, Denmark

RE: What should focus be on hardware for solr servers?

2013-02-13 Thread Toke Eskildsen
expect to handle, what do you expect a query to look like, how should the result be presented? Regards, Toke Eskildsen

RE: What should focus be on hardware for solr servers?

2013-02-13 Thread Toke Eskildsen
to the documents they belong to. The penalty for having thousands or millions of terms as compared to tens or hundreds in a field in an inverted index is very small. We're still in any random machine you've got available-land so I second Michael's suggestion. Regards, Toke Eskildsen

RE: What should focus be on hardware for solr servers?

2013-02-14 Thread Toke Eskildsen
Regards, Toke Eskildsen

Re: SOLR4 SAN vs Local Disk?

2013-02-20 Thread Toke Eskildsen
On Tue, 2013-02-19 at 18:39 +0100, chamara wrote: Hi Thanks Shawn for the Input, Yes i am using SolrCloud to replicate the index to another server running with the same spec with 32cores and 72GB RAM on each machine. I have to test the performance of RAID 10? Have you ever done a deployment

Re: Newbie question on recurring theme: Dynamic Fields

2013-02-20 Thread Toke Eskildsen
On Wed, 2013-02-20 at 10:06 +0100, Erik Dybdahl wrote: However, after definining field name=customerField_* type=string indexed=true stored=true multiValued=true/ Seems like a typo to me: You need to write dynamicField, not field, when defining a dynamic field. Regards, Toke Eskildsen

Re: Running solr on small amounts of RAM

2011-09-12 Thread Toke Eskildsen
On Fri, 2011-09-09 at 18:48 +0200, Mike Austin wrote: Our index is very small with 100k documents and a light load at the moment. If I wanted to use the smallest possible RAM on the server, how would I do this and what are the issues? The index size depends just as much on the size of the

Re: Generating large datasets for Solr proof-of-concept

2011-09-16 Thread Toke Eskildsen
On Thu, 2011-09-15 at 22:54 +0200, Pulkit Singhal wrote: Has anyone ever had to create large mock/dummy datasets for test environments or for POCs/Demos to convince folks that Solr was the wave of the future? Yes, but I did it badly. The problem is that real data are not random so any simple

Re: Seek your wisdom for implementing 12 million docs..

2011-09-26 Thread Toke Eskildsen
On Sun, 2011-09-25 at 22:00 +0200, Ikhsvaku S wrote: Documents: We have close to ~12 million XML docs, of varying sizes average size 20 KB. These documents have 150 fields, which should be searchable indexed. [...] Approximately ~6000 such documents are updated 400-800 new ones are added

Re: drastic performance decrease with 20 cores

2011-09-27 Thread Toke Eskildsen
On Tue, 2011-09-27 at 02:43 +0200, Bictor Man wrote: thanks for your replies. indeed the filesystem caching seems to be the difference. sadly I can't add more memory and the 6GB/20core combination doesn't work. so I'll just try to tweak it as much as I can. A (better) alternative to more

Re: strange performance issue with many shards on one server

2011-09-28 Thread Toke Eskildsen
On Wed, 2011-09-28 at 12:58 +0200, Frederik Kraus wrote: - 10 shards per server (needed for response times) running in a single tomcat instance Have you tested that sharding actually decreases response times in your case? I see the idea in decreasing response times with sharding at the cost of

Re: a bug of solr distributed search

2010-10-25 Thread Toke Eskildsen
sharding is given, it should be followed with a but be aware that it will make relevance ranking unreliable. Regards, Toke Eskildsen

Re: a bug of solr distributed search

2010-10-25 Thread Toke Eskildsen
, logical grouping and distributed IDF? Regards, Toke Eskildsen

Re: AW: FieldCache

2010-10-25 Thread Toke Eskildsen
On Mon, 2010-10-25 at 09:41 +0200, Mathias Walter wrote: [...] I enabled the field cache for my ID field and another single char field (PAS type) to get the benefit of accessing the fields with an array. Unfortunately, the IDs are too large to fit in memory. I gave 12 GB of RAM to each node

Re: a bug of solr distributed search

2010-10-27 Thread Toke Eskildsen
. The problem is of course to judge the quality of the outputs, but setting the single index as the norm and plotting the differences in document positions in the result sets might provide some insight. Regards, Toke Eskildsen

Re: How do I this in Solr?

2010-10-27 Thread Toke Eskildsen
working idea. Maybe Varun could comment on the maximum numbers of terms that his queries will contain? Regards, Toke Eskildsen On Wed, 2010-10-27 at 15:02 +0200, Mike Sokolov wrote: Right - my point was to combine this with the previous approaches to form a query like: samsung AND android

Re: how well does multicore scale?

2010-10-27 Thread Toke Eskildsen
On Wed, 2010-10-27 at 14:20 +0200, mike anderson wrote: [...] By my simple math, this would mean that if we want each shard's index to be able to fit in memory, [...] Might I ask why you're planning on using memory-based sharding? The performance gap between memory and SSDs is not very big so

RE: Solr sorting problem

2010-10-27 Thread Toke Eskildsen
Jonathan Rochkind [rochk...@jhu.edu] wrote: I too sometimes have similar use cases, and my best ideas about how to solve them involve using faceting --- you can facet on a multi-valued field, and you can sort facets--but you can only sort facets by index order, a strict byte-by-byte sort.

RE: how well does multicore scale?

2010-10-27 Thread Toke Eskildsen
. Regards, Toke Eskildsen

Re: Natural string sorting

2010-10-29 Thread Toke Eskildsen
On Fri, 2010-10-29 at 10:18 +0200, RL wrote: Executing a query and sorting by this field leads to unnatural sorting of : string1 string10 string2 That's very much natural. Numbers are not treated any different from words made up of letters. Your have to use alignment if you want to use

Re: Looking for Developers

2010-10-29 Thread Toke Eskildsen
On Fri, 2010-10-29 at 10:06 +0200, Mark Allan wrote: For me, I simply deleted the original email, but I'm now quite enjoying the irony of the complaints causing more noise on the list than the original email! ;-) He he. An old classic. Next in line is the meta-meta-discussion about

RE: Ensuring stable timestamp ordering

2010-10-31 Thread Toke Eskildsen
Lance Norskog [goks...@gmail.com] wrote: It would be handy to have an auto-incrementing date field, so that each document would get a unique number and the timestamp would then be the unique ID of the document. If someone want to implement this, I'll just note that the granilarity of Solr

RE: Ensuring stable timestamp ordering

2010-10-31 Thread Toke Eskildsen
Dennis Gearon [gear...@sbcglobal.net] wrote: Even microseconds may not be enough on some really good, fast machine. True, especially since the timer might not provide microsecond granularity although the returned value is in microseconds. However, an unique timestamp generator should keep

RE: Ensuring stable timestamp ordering

2010-11-02 Thread Toke Eskildsen
Dennis Gearon [gear...@sbcglobal.net] wrote: how about a timrstamp with either a GUID appended on the end of it? Since long (8 bytes) is the largest atomic type supported by Java, this would have to be represented as a String (or rather BytesRef) and would take up 4 + 32 bytes + 2 * 4 bytes

Re: my index has 500 million docs ,how to improve solr search performance?

2010-11-15 Thread Toke Eskildsen
On Mon, 2010-11-15 at 06:35 +0100, lu.rongbin wrote: In addition,my index has only two store fields, id and price, and other fields are index. I increase the document and query cache. the ec2 m2.4xLarge instance is 8 cores, 68G memery. all indexs size is about 100G. Looking at

Re: Solr Memory Usage

2010-12-14 Thread Toke Eskildsen
On Tue, 2010-12-14 at 06:07 +0100, Cameron Hurst wrote: [Cameron expected 150MB overhead] As I start to index data and passing queries to the database I notice a steady rise in the RAM but it doesn't stop at 150MB. If I continue to reindex the exact same data set with no additional data

RE: Explanation of the different caches.

2010-12-21 Thread Toke Eskildsen
Stijn Vanhoorelbeke [stijn.vanhoorelb...@gmail.com] wrote: I want to do a quickdirt load testing - but all my results are cached. I commented out all the Solr caches - but still everything is cached. * Can the caching come from the 'Field Collapsing Cache'. -- although I don't see this

Re: solr benchmarks

2011-01-02 Thread Toke Eskildsen
On Sat, 2011-01-01 at 03:06 +0100, Tri Nguyen wrote: I remember going through some page that had graphs of response times based on index size for solr. Anyone know of such pages? Sorry, no. Some small scale tests with our corpus showed that response times suffered less than proportionally

Re: Improving Solr performance

2011-01-07 Thread Toke Eskildsen
is the same regardless of the number of drives. If your current response time for a single user is satisfactory, adding drives is a viable solution for you. I'll still recommend the SSD option though, as it will also lower the response time for a single query. Regards, Toke Eskildsen

Re: Improving Solr performance

2011-01-10 Thread Toke Eskildsen
On Mon, 2011-01-10 at 21:43 +0100, Paul wrote: I see from your other messages that these indexes all live on the same machine. You're almost certainly I/O bound, because you don't have enough memory for the OS to cache your index files. With 100GB of total index size, you'll get

Re: Solr: using to index large folders recursively containing lots of different documents, and querying over the web

2011-01-14 Thread Toke Eskildsen
On Fri, 2011-01-14 at 13:05 +0100, Cathy Hemsley wrote: I hope you can help. We are migrating our intranet web site management system to Windows 2008 and need a replacement for Index Server to do the text searching. I am trying to establish if Lucene and Solr is a feasible replacement, but I

RE: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Toke Eskildsen
[] ASF Mirrors (linked in our release announcements or via the Lucene website) [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [X] I/we build them from source via an SVN/Git checkout. [X] Other (someone in your company mirrors them internally or via a downstream project)

Re: unix permission styles for access control

2011-01-19 Thread Toke Eskildsen
On Wed, 2011-01-19 at 08:15 +0100, Dennis Gearon wrote: I was wondering if the are binary operation filters? Haven't seen any in the book nor was able to find any using google. So if I had 0600(octal) in a permission field, and I wanted to return any records that 'permission

Re: pruning search result with search score gradient

2011-01-20 Thread Toke Eskildsen
On Tue, 2011-01-11 at 12:12 +0100, Julien Piquot wrote: I would like to be able to prune my search result by removing the less relevant documents. I'm thinking about using the search score : I use the search scores of the document set (I assume there are sorted by descending order),

Re: Performance optimization of Proximity/Wildcard searches

2011-01-25 Thread Toke Eskildsen
On Tue, 2011-01-25 at 10:20 +0100, Salman Akram wrote: Cache warming is a good option too but the index get updated every hour so not sure how much would that help. What is the time difference between queries with a warmed index and a cold one? If the warmed index performs satisfactory, then

Re: faceting over ngrams

2011-03-16 Thread Toke Eskildsen
On Wed, 2011-03-16 at 13:05 +0100, Dmitry Kan wrote: Hello guys. We are using shard'ed solr 1.4 for heavy faceted search over the trigrams field with about 1 million of entries in the result set and more than 100 million of entries to facet on in the index. Currently the faceted search is very

Re: hierarchical faceting, SOLR-792 - confused on config

2011-03-17 Thread Toke Eskildsen
On Wed, 2011-03-16 at 18:36 +0100, Erik Hatcher wrote: Sorry, I missed the original mail on this thread I put together that hierarchical faceting wiki page a couple of years ago when helping a customer evaluate SOLR-64 vs. SOLR-792 vs.other approaches. Since then, SOLR-792 morphed and

Re: Showing facet of first N docs

2011-06-20 Thread Toke Eskildsen
On Thu, 2011-06-16 at 12:39 +0200, Tommaso Teofili wrote: Do you know if it is possible to show the facets for a particular field related only to the first N docs of the total number of results? It collides with the inner working in Solr, as faceting does not process the doc-IDs from the

Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-29 Thread Toke Eskildsen
On Wed, 2011-06-29 at 09:35 +0200, eks dev wrote: In MMAP, you need to have really smart warm up (MMAP) to beat IO quirks, for RAMDir you need to tune gc(), choose your poison :) Other alternatives are operating system RAM disks (avoids the GC problem) and using SSDs (nearly the same

Re: Taxonomy faceting

2011-06-30 Thread Toke Eskildsen
On Thu, 2011-06-30 at 11:38 +0200, Russell B wrote: a multivalued field labelled category which for each document defines where in the tree it should appear. For example: doc1 has the category field set to 0/topics, 1/topics/computing, 2/topic/computing/systems. I then facet on the

Re: what s the optimum size of SOLR indexes

2011-07-04 Thread Toke Eskildsen
On Mon, 2011-07-04 at 13:51 +0200, Jame Vaalet wrote: What would be the maximum size of a single SOLR index file for resulting in optimum search time ? There is no clear answer. It depends on the number of (unique) terms, number of documents, bytes on storage, storage speed, query complexity,

Re: Virtual Memory usage increases beyond Xmx with Solr 3.3

2011-07-08 Thread Toke Eskildsen
On Fri, 2011-07-08 at 07:12 +0200, Nikhil Chhaochharia wrote: However, if I upgrade to Solr 3.3, then the Virtual Memory of the Tomcat process increases to roughly the index size (70GB). Any ideas why this is happening? Maybe you switched to MMapDirectory?

Re: How to cap facet counts beyond a specified limit

2012-06-08 Thread Toke Eskildsen
works for text fields. - Toke Eskildsen, State and University Library, Denmark

Re: what's better for in memory searching?

2012-06-11 Thread Toke Eskildsen
to implement and nearly all of your work on this will be usable for a RAM-based solution, if you are not satisfied with the speed. Or you could buy a small cheap SSD and have no more worries... Regards, Toke Eskildsen

Re: delete by query don't work

2012-06-18 Thread Toke Eskildsen
On Mon, 2012-06-18 at 11:45 +0200, ramzesua wrote: Hi all. I am using solr 4.0 and trying to clear index by query. At first I use deletequery*:*/query/delete with commit, but index is still not empty. I tried another queries, but it not help me. Then I tried delete by `id`. It works fine, but

Re: Solr faceting -- sort order

2012-07-19 Thread Toke Eskildsen
that allows for custom ordering, but it sorts upon index open and thus has a fairly long start up time. Besides, it it not in a proper state for production: https://issues.apache.org/jira/browse/SOLR-2412 - Toke Eskildsen, State and University Library, Denmark

Re: Importing index - Real Time or Queued?

2012-07-19 Thread Toke Eskildsen
seconds without the server straining. - Toke Eskildsen

Re: Importing index - Real Time or Queued?

2012-07-19 Thread Toke Eskildsen
? What you're looking for is probably uniqueKey: https://wiki.apache.org/solr/UniqueKey - Toke Eskildsen

Re: Importing index - Real Time or Queued?

2012-07-19 Thread Toke Eskildsen
that they are talking about 10 million documents and 10,000 updates. That quite far from what you've got. - Toke Eskildsen

Re: Indexing wildcard patterns

2012-08-10 Thread Toke Eskildsen
On Fri, 2012-08-10 at 10:07 +0200, Lochschmied, Alexander wrote: Coming from a SQL database based search system, we already have a set of defined patterns associated with our searchable documents. % matches no or any number of characters _ matches one character Example: Doc 1: 'AB%CD',

Re: Error while indexing data using Solr (Unexpected character 'F' (code 70) in prolog; expected '')

2012-08-27 Thread Toke Eskildsen
On Mon, 2012-08-27 at 14:29 +0200, dhaivat dave wrote: I am getting an error while indexing data to solr. i am using solrj apis to index the document and using the xml request handler to index document. i am getting an error *org.apache.solr.common.SolrException: Unexpected character 'F' (code

Re: Solr4 distributed IDF

2012-09-03 Thread Toke Eskildsen
indexes are controlled by different parties, where the parties does want to collaborate on the distribution part but does not want to have their data indexed by the other parties. We currently have this challenge. Regards, Toke Eskildsen

Re: Sorting on mutivalued fields still impossible?

2012-09-05 Thread Toke Eskildsen
On Fri, 2012-08-31 at 13:35 +0200, Erick Erickson wrote: Imagine you have two entries, aardvark and emu in your multiValued field. How should that document sort relative to another doc with camel and zebra? Any heuristic you apply will be wrong for someone else I see two obvious choices

Re: Sorting on mutivalued fields still impossible?

2012-09-10 Thread Toke Eskildsen
and that choosing by setup would require the user to have a fairly deep understanding. I accept that there is no clear need for the functionality at this point in time and defer hacking on it. Thank you for your input, Toke Eskildsen

Re: Solr Sorting Caching

2012-09-11 Thread Toke Eskildsen
On Tue, 2012-09-11 at 08:00 +0200, Amey Patil wrote: Our solr index (Solr 3.4) has over 100 million docuemnts. [...] *((keyword1 AND keyword2...) OR (keyword3 AND keyword4...) OR ...) AND date:[date1 TO *]* No. of keywords can be in the range of 100 - 1000. We are adding sort parameter *'date

Re: RES: RES: Problem with accented words sorting

2012-09-11 Thread Toke Eskildsen
On Mon, 2012-09-10 at 16:04 +0200, Claudio Ranieri wrote: When I used the CollationKeyFilterFactory in my facet (example below), the value of facet went wrong. When I remove the CollationKeyFilterFactory of type of facet, the value went correct. As Ahmed wrote, CollationKeyFilter is meant for

Re: RES: RES: RES: Problem with accented words sorting

2012-09-11 Thread Toke Eskildsen
On Tue, 2012-09-11 at 12:14 +0200, Claudio Ranieri wrote: This is an interesting feature to be implemented, because we can sort the results correctly, but not in the facets. At work (State and University Library, Denmark) we use collator-ordered faceting for author title, but out current

Re: RES: RES: RES: RES: Problem with accented words sorting

2012-09-11 Thread Toke Eskildsen
. Regards, Toke Eskildsen

Re: SolrJ - IOException

2012-09-25 Thread Toke Eskildsen
On Tue, 2012-09-25 at 01:50 +0200, balaji.gandhi wrote: I am encountering this error randomly (under load) when posting to Solr using SolrJ. Has anyone encountered a similar error? org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at:

Re: How can I create about 100000 independent indexes in Solr?

2012-09-25 Thread Toke Eskildsen
that is an issue or not depends on the content. e.g. for email archives, the single index will not work very well. - Toke Eskildsen, State and University Library, Denmark

Re: How can I create about 100000 independent indexes in Solr?

2012-09-25 Thread Toke Eskildsen
On Tue, 2012-09-25 at 04:21 +0200, 韦震宇 wrote: The company I'm working in have a website to server more than 10 customers, and every customer should have it's own search cataegory. So I should create independent index for every customer. How many of the customers are active at any given

Re: Problem with Special Characters in SOLR Query

2012-09-27 Thread Toke Eskildsen
On Thu, 2012-09-27 at 13:49 +0200, aniljayanti wrote: But getting error with below. q=Oot \ Aboot Error message : -- message org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \': Lexical error at line 1, column 6. Encountered: EOF after : It seems like you are

Re: Boosting in query level the relevance based in content of any fields

2012-10-01 Thread Toke Eskildsen
On Fri, 2012-09-28 at 14:43 +0200, Claudio Ranieri wrote: name | city Jose | Campinas Jose | São Paulo Jose | Rio de Janeiro Jose | Rio Branco Jose | Ourinhos In search by Jose, I wish return on top the documents (Jose | São Paulo and Jose | Rio de Janeiro). If all documents has a city

Re: RES: Boosting in query level the relevance based in content of any fields

2012-10-01 Thread Toke Eskildsen
On Mon, 2012-10-01 at 14:20 +0200, Claudio Ranieri wrote: Is there a way to omit the cities with boosting 1? The number of cities is big, but the number of important cities is small. Sorry, not with this simple trick. Maybe a function query, as 曹霖 suggests, can help you, but I have no

Re: Problem with relating values in two multi value fields

2012-10-08 Thread Toke Eskildsen
the potential combinatorial explosion of your primary secondary values. So that leaves the question: How many distinct combinations of primary and secondary values do you have? Regards, Toke Eskildsen

Re: solr1.4 code Example

2012-10-08 Thread Toke Eskildsen
On Mon, 2012-10-08 at 13:08 +0200, Sujatha Arun wrote: I am unable to unzip the 5883_Code.zip file for solr 1.4 from paktpub site .I get the error message End-of-central-directory signature not found. [...] It is a corrupt ZIP-file. I'm guessing you got it from

Re: Wild card searching - well sort of

2012-10-10 Thread Toke Eskildsen
On Wed, 2012-10-10 at 14:15 +0200, Kissue Kissue wrote: I have added the string: *-BAAN-* to the index to a field called pattern which is a string type. Now i want to be able to search for A100-BAAN-C20 or ZA20-BAAN-300 and have Solr return *-BAAN-*. That sounds a lot like the problem

Re: Unique terms without faceting

2012-10-11 Thread Toke Eskildsen
in development hours. I would suggest hacking the current faceting code to use OpenBitSet instead of int[] and doing performance tests on that. PerSegmentSingleValuedFaceting.SegFacet and UnivertedField.getCounts seems to be the right places to look in Solr 4. Regards, Toke Eskildsen, State and University

Re: OutOfMemoryError

2013-03-14 Thread Toke Eskildsen
On Thu, 2013-03-14 at 13:10 +0100, Arkadi Colson wrote: When I shutdown tomcat free -m and top keeps telling me the same values. Almost no free memory... Any idea? Are you reading top free right? It is standard behaviour for most modern operating systems to have very little free memory. As

Re: Facets with 5000 facet fields

2013-03-18 Thread Toke Eskildsen
not need to facet on all fields all the time. If you do need to facet on all fields on each call, you will need to scale to many machines to get proper performance and the merging overhead will likely be huge. Regards, Toke Eskildsen

RE: Facets with 5000 facet fields

2013-03-19 Thread Toke Eskildsen
Toke Eskildsen [t...@statsbiblioteket.dk] wrote: [Solr, 11M documents, 5000 facet fields, 12GB RAM, OOM] 5000 fields @ 9 MByte is about 45GB for faceting. If you are feeling really adventurous, take a look at https://issues.apache.org/jira/browse/SOLR-2412 I tried building a test-index

Re: Facets with 5000 facet fields

2013-03-20 Thread Toke Eskildsen
with SSD. It has 16GB of RAM and runs two search instances, each with ~11M documents, one with a 52GB index, one with 71GB. - Toke Eskildsen

Re: Facets with 5000 facet fields

2013-03-20 Thread Toke Eskildsen
minutes), we will have to look into this. On that note, Lucene's faceting with a central repository for the facet terms looks very interesting as it opens up for both fast startup and fast queries. Regards, Toke Eskildsen

Re: Sort-field for ALL docs in FieldCache for sort queries - OOM on lots of docs

2013-03-21 Thread Toke Eskildsen
to the FieldCache. [...] I haven't used it yet, but DocValues in Solr 4.2 seems to be the answer. - Toke Eskildsen

Re: Sort-field for ALL docs in FieldCache for sort queries - OOM on lots of docs

2013-03-21 Thread Toke Eskildsen
attribute for StrField, UUIDField and all Trie*Fields, but I do not know if they are used automatically by sort or if they should be requested explicitly. Regards, Toke Eskildsen

RE: SOLR4/lucene and JVM memory management

2013-03-24 Thread Toke Eskildsen
you hit OOM, changing to 3GB seems like a better choice than 4GB to me. Especially since you describe the allocation up to 3GB as gradual, which tells me that your installation is not starved with 3GB. - Toke Eskildsen

RE: Solr using a ridiculous amount of memory

2013-03-24 Thread Toke Eskildsen
requirement, it is far from the 25GB that you are allocating. Either you have an interestingly high number somewhere in the equation or something's off. Regards, Toke Eskildsen

RE: Solr using a ridiculous amount of memory

2013-03-24 Thread Toke Eskildsen
Toke Eskildsen [t...@statsbiblioteket.dk]: If your whole index has 10M documents, which each has 100 values for each field, with each field having 50M unique values, then the memory requirement would be more than 10M*log2(100*10M) + 100*10M*log2(50M) bit ~= 340MB/field ~= 1.6GB for faceting

Re: Out of memory on some faceting queries

2013-04-02 Thread Toke Eskildsen
. Facets and sorting are often memory hungry, but your system seems to have 13GB free RAM so the easy solution attempt would be to increase the heap until Solr serves the facets without OOM. - Toke Eskildsen, State and University Library, Denmark

Re: Out of memory on some faceting queries

2013-04-02 Thread Toke Eskildsen
the faceting can be prepared? https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F Regards, Toke Eskildsen

Re: Out of memory on some faceting queries

2013-04-02 Thread Toke Eskildsen
list is about these topics. How often do you commit and how many unique values does your facet fields have? Regards, Toke Eskildsen

Re: Out of memory on some faceting queries

2013-04-03 Thread Toke Eskildsen
On Tue, 2013-04-02 at 17:08 +0200, Dotan Cohen wrote: Most of the time I facet on one field that has about twenty unique values. They are likely to be disk cached so warming those for 9M documents should only take a few seconds. However, once per day I would like to facet on the text field,

RE: Sub field indexing

2013-04-08 Thread Toke Eskildsen
with productZ, version 85 compatible_engine:productZ* to get all products compatible with any version of productZ. - Toke Eskildsen

Re: Sub field indexing

2013-04-09 Thread Toke Eskildsen
On Tue, 2013-04-09 at 08:40 +0200, It-forum wrote: Le 08/04/2013 20:02, Toke Eskildsen a écrit : compatible_engine:productZ/85 to get all products compatible with productZ, version 85 compatible_engine:productZ* to get all products compatible with any version of productZ. Whoops, slash

Re: Solr using a ridiculous amount of memory

2013-04-15 Thread Toke Eskildsen
to pinpoint the memory eater in your setup? - Toke Eskildsen

Re: Solr using a ridiculous amount of memory

2013-04-15 Thread Toke Eskildsen
(of which the majority will be null) will be #clients*#documents*#facet_fields This means that the adding a new client will be progressively more expensive. On the other hand, if you use a lot of small shards, DocValues should work for you. Regards, Toke Eskildsen

Re: first time with new keyword, solr take to much time to give the result

2013-04-16 Thread Toke Eskildsen
prices of SSDs I would really advice that you choose that road instead. Regards, Toke Eskildsen, State and University Library, Denmark

RE: Solr using a ridiculous amount of memory

2013-04-17 Thread Toke Eskildsen
warmups still running when new commits are triggered. Regards, Toke Eskildsen, State and University Library, Denmark

RE: Solr using a ridiculous amount of memory

2013-04-17 Thread Toke Eskildsen
JDK (look somewhere in the bin folder), is your friend. Just start it on the server and click on the relevant process. Regards, Toke Eskildsen

RE: Solr using a ridiculous amount of memory

2013-04-17 Thread Toke Eskildsen
Whopps. I made some mistakes in the previous post. Toke Eskildsen [t...@statsbiblioteket.dk]: Extrapolating from 1.4M documents and 180 clients, let's say that there are 1.4M/180/5 unique terms for each sort-field and that their average length is 10. We thus have 1.4M*log2(1500*10*8) + 1500

Re: facet.method enum vs fc

2013-04-18 Thread Toke Eskildsen
On Wed, 2013-04-17 at 20:06 +0200, Mingfeng Yang wrote: I am doing faceting on an index of 120M documents, on the field of url[...] I would guess that you would need 3-4GB for that. How much memory do you allocate to Solr? - Toke Eskildsen

  1   2   3   4   5   6   >