Yes, absolutely correct, comma is missing at the end of line 10
All key-value pairs inside the same block should be comma separated, except
last one
From: Shawn Heisey
Reply: solr-user@lucene.apache.org
and prefer closer to an OR for
smaller
> collections.
>
> -Doug
>
> On Tue, Feb 21, 2017 at 1:39 PM Fuad Efendi <f...@efendi.ca > wrote:
>
>> Thank you Ahmet, I will try it; sounds reasonable
>>
>>
>> From: Ahmet Arslan <iori...@yahoo.com.invalid >
t goes?
Ahmet
On Tuesday, February 21, 2017 4:28 AM, Fuad Efendi <f...@efendi.ca> wrote:
Hello,
Default TF-IDF performs poorly with the indexed 200 millions documents.
Query "Michael Jackson" may run 300ms, and "Michael The Jackson" over 3
seconds. eDisMax. Because
chael Jackson” runs 300ms instead of 3ms just because huge number
of hits and TF-IDF calculations. Solr 6.3.
Thanks,
--
Fuad Efendi
(416) 993-2060
http://www.tokenizer.ca
Search Relevancy, Recommender Systems
user
pass
dbname
localhost
1433
--
Fuad Efendi
(416) 993-2060
http://www.tokenizer.ca
Search Relevancy, Recommender Systems
From: Per Newgro <per.new...@gmx.ch> <per.new...@gmx.ch>
Repl
Were you indexing new documents while reloading? “Previously we’ve done
reloads of a collection after changing solrconfig.xml without any issues.”
--
Fuad Efendi
(416) 993-2060
http://www.tokenizer.ca
Search Relevancy, Recommender Systems
From: Kelly, Frank <frank.ke...@here.com> <
Correct: multivalued field with 1 shop IDs. Use case: shopping network
in U.S. for example for a big brand such as Walmart, when user implicitly
provides IP address or explicitly Postal Code, so that we can find items in
his/her neighbourhood.
You basically provide “join” information via
Not; historical logs for document updates is not provided. Users need to
implement such functionality themselves if needed.
From: Mahmoud Almokadem
Reply: solr-user@lucene.apache.org
, it will simplify life ;)
On November 4, 2016 at 12:05:13 PM, Fuad Efendi (f...@efendi.ca) wrote:
Yes we need that documented,
http://stackoverflow.com/questions/8924102/restricting-ip-addresses-for-jetty-and-solr
Of course Firewall is a must for extremely strong environments / large
corporations, DMZ
+ DMZ(s)
--
Fuad Efendi
(416) 993-2060
http://www.tokenizer.ca
Search Relevancy, Recommender Systems
On November 4, 2016 at 9:28:21 AM, David Smiley (david.w.smi...@gmail.com)
wrote:
I was just researching how to secure Solr by IP address and I finally
figured it out. Perhaps this might go
ry different.
I had recently assignment at well-known retail shop where we even designed
pre-query custom boosts so that we can customize typical (most important
for the business) queries as per business needs
Thanks,
--
Fuad Efendi
(416) 993-2060
http://www.tokenizer.ca
Search Relevancy, Recomm
ed general connectivity/authentication problems.
Thanks,
Jamie
On Wed, Nov 2, 2016 at 4:58 PM, Fuad Efendi <f...@efendi.ca> wrote:
> In MySQL, this command will explicitly allow to connect from
> remote ICZ2002912 host, check MySQL documentation:
>
> GRANT ALL ON my
In MySQL, this command will explicitly allow to connect from remote ICZ2002912
host, check MySQL documentation:
GRANT ALL ON mysite.* TO 'root’@'ICZ2002912' IDENTIFIED BY ‘Oakton123’;
On November 2, 2016 at 4:41:48 PM, Fuad Efendi (f...@efendi.ca) wrote:
This is the root of the problem
you need to allow MySQL & Co. to accept connections from ICZ2002912.
Plus, check DNS resolution, etc.
Thanks,
--
Fuad Efendi
(416) 993-2060
http://www.tokenizer.ca
Recommender Systems
On November 2, 2016 at 2:37:08 PM, Jamie Jackson (jamieja...@gmail.com) wrote:
I'm at a brick wall. Here
sider sharding / SolrCloud if you need huge memory
just for field cache. And you will be forced to consider it if you gave more
that 2 billions documents (am I right? Lucene internal limitation,
Integer.MAX_INT)
Thanks,
--
Fuad Efendi
(416) 993-2060
http://www.tokenizer.ca
Search Rele
internal
caches.
Solr has the way to warm up internal caches before making new searcher
available:
https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
Make this queries typical for your use cases (for instance, *:* with faceting):
Thanks,
--
Fuad Efendi
(416
.
But it works fine with KeywordTokenizer.
Any idea why? Thanks,
--
Fuad Efendi
http://www.tokenizer.ca
Data Mining, Vertical Search
;what is the best way to stop Solr when it gets in OOM” (or just
becomes irresponsive because of swallowed exceptions)
--
Fuad Efendi
416-993-2060(cell)
On February 25, 2016 at 2:37:45 PM, CP Mishra (mishr...@gmail.com) wrote:
Looking at the previous threads (and in our tests), oom script spec
I can
manually create an httpclient and set up authentication but then I can't use
solrj.
Yes; correct; except that you _can_ use solj with this custom HttpClient
instance (which will intercept authentication, which will support cookies, SSL
or plain HTTP, Keep-Alive, and etc.)
You can
Hi,
Few months ago I was able to modify Wiki; I can't do it now, probably
because http://wiki.apache.org/solr/ContributorsGroup
Please add me: FuadEfendi
Thanks!
--
Fuad Efendi, PhD, CEO
C: (416)993-2060
F: (416)800-6479
Tokenizer Inc., Canada
http://www.tokenizer.ca
Hi,
Please add me: FuadEfendi
Thanks!
--
http://www.tokenizer.ca
...
-Fuad Efendi
http://www.tokenizer.ca
-Original Message-
From: vybe3142 [mailto:vybe3...@gmail.com]
Sent: October-03-12 12:30 PM
To: solr-user@lucene.apache.org
Subject: Re: Can SOLR Index UTF-16 Text
Thanks for all the responses. Problem partially solved (see below)
1. In a sense, my
your file to
Solr)
-Fuad Efendi
http://www.tokenizer.ca
-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: October-03-12 1:30 PM
To: solr-user@lucene.apache.org
Subject: RE: Can SOLR Index UTF-16 Text
Something is missing from the body of your Email... As I pointed
Solr can index bytearrays too: unigram, bigram, trigram... even bitsets,
tritsets, qatrisets ;- )
LOL I got strong cold...
BTW, don't forget to configure UTF-8 as your default (Java) container
encoding...
-Fuad
have
such
large documents? This appears to be a hard limit based of 24-bytes in a
Java
int.
You can try facet.method=enum, but that may be too slow.
What release of Solr are you running?
-- Jack Krupansky
-Original Message-
From: Fuad Efendi
Sent: Monday, August 20, 2012 4:34 PM
To: Solr
of 24-bytes in a
Java
int.
You can try facet.method=enum, but that may be too slow.
What release of Solr are you running?
-- Jack Krupansky
-Original Message- From: Fuad Efendi
Sent: Monday, August 20, 2012 4:34 PM
To: Solr-User@lucene.apache.org
Subject: UnInvertedField
for a specific term MyTerm, and when I execute
query channel:MyTerm it shows 650 documents foundŠ possibly bugŠ it
happens after I commit data too, nothing changes; and this field is
single-valued non-tokenized string.
-Fuad
--
Fuad Efendi
416-993-2060
http://www.tokenizer.ca
Hi there,
Load term Info shows 3650 for a specific term MyTerm, and when I execute
query channel:MyTerm it shows 650 documents found possibly bug it
happens after I commit data too, nothing changes; and this field is
single-valued non-tokenized string.
-Fuad
--
Fuad Efendi
416-993-2060
http
too, nothing changes; and this field is
single-valued non-tokenized string.
-Fuad
--
Fuad Efendi
416-993-2060
http://www.tokenizer.ca
://solr-ra.tgels.org
Regards
- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org
ps. Note: Apache Solr 4.0 with RankingAlgorithm 1.4.4 is an external
implementation
On 8/13/2012 11:38 AM, Fuad Efendi wrote:
SOLR-4.0
I am trying to implement this; funny idea
(SolrCore.java:1561)
--
Fuad Efendi
http://www.tokenizer.ca
)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand
ler.java:204)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561)
--
Fuad Efendi
http
will accumulate search results from three layers, it will be
near real time.
Any thoughts? Thanks,
--
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
http://www.tokenizer.ca
http://www.linkedin.com/in/lucene
FWIW, when asked at what point one would want to split JVMs and shard,
on the same machine, Grant Ingersoll mentioned 16GB, and precisely for
GC cost reasons. You're way above that.
- his index is 75G, and Grant mentioned RAM heap size; we can use terabytes
of index with 16Gb memory.
, Web Services, Moreover, Web Ping, SQL-import, sitemaps-based,
intranets, and more.
Additionally to that, I can design super-rich UI extremely fast using tools
such as Liferay Portal, Apache Wicket, Vaadin.
Thanks,
--
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
http://www.tokenizer.ca http
, Web Services, Moreover, Web Ping, SQL-import, sitemaps-based,
intranets, and more.
Additionally to that, I can design super-rich UI extremely fast using tools
such as Liferay Portal, Apache Wicket, Vaadin.
Thanks,
--
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
http://www.tokenizer.ca http
I agree that SSD boosts performance... In some rare not-real-life scenario:
- super frequent commits
That's it, nothing more except the fact that Lucene compile time including
tests takes up to two minutes on MacBook with SSD, or forty-fifty minutes on
Windows with HDD.
Of course, with non-empty
It's not Jetty. It is broken TCP pipe due to client-side. It happens when
client closes TCP connection.
And I even had this problem with recent Tomcat 6.
Problem disappeared after I explicitly tuned keep-alive at Tomcat, and started
using monitoring thread with HttpClient and SOLRJ...
Fuad
I am using Lily for atomic index updates ( implemented very nice;
transactionally; plus MapReduce; plus auto-denormaluzing)
http://www.lilyproject.org
It slows down mean time 7-10 times, but TPS still the same
- Fuad
http://www.tokenizer.ca
Sent from my iPad
On 2011-11-10, at 9:59 PM,
I use -Xms3072M .
Large CPU instance is virtualization and behaviour is unpredictable.
Choose cluster instance with explicit Intel XEON CPU (instead of
CPU-Units) and compare behaviour; $1.60/hour. Please share results.
Thanks,
--
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
Data
I agree with Yonik of course;
But
You should see OOM errors in this case. In case of virtualization
however it is unpredictable and if JVM doesn't have few bytes to output
OOM into log file (because we are catching throwable and trying to
generate HTTP 500 instead !!! Freaky)
fields per instance they don't have any problem outside Amazon
;)))
--
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
Data Mining, Search Engines
http://www.tokenizer.ca
On 11-08-17 11:08 PM, Fuad Efendi f...@efendi.ca wrote:
more investigation and I see that I have 100+ dynamic fields
(which has to be default setting in upcoming releases
Java 6)
Do we need to disable -XX:-DoEscapeAnalysis as IBM suggests?
http://www-01.ibm.com/support/docview.wss?uid=swg21422605
Thanks,
Fuad Efendi
http://www.tokenizer.ca
, Fuad Efendi f...@efendi.ca wrote:
Anyone tried this? I can not start Solr-Tomcat with following options on
Ubuntu:
JAVA_OPTS=$JAVA_OPTS -Xms2048m -Xmx2048m -Xmn256m -XX:MaxPermSize=256m
JAVA_OPTS=$JAVA_OPTS -Dsolr.solr.home=/data/solr -Dfile.encoding=UTF8
-Duser.timezone=GMT
I think the question is strange... May be you are wondering about possible
OOM exceptions? I think we can pass to Lucene single document containing
comma separated list of term, term, ... (few billion times)... Except
stored and TermVectorComponent...
I believe thousands companies already indexed
Hi Otis,
I am recalling pagination feature, it is still unresolved (with default
scoring implementation): even with small documents, searching-retrieving
documents 1 to 10 can take 0 milliseconds, but from 100,000 to 100,010 can
take few minutes (I saw it with trunk version 6 months ago, and
WHERE KEY2=? ORDER BY KEY1 -
check everything...
Thanks,
--
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
Data Mining, Search Engines
http://www.tokenizer.ca http://www.tokenizer.ca/
On 11-06-05 12:09 AM, Rohit Gupta ro...@in-rev.com wrote:
No didn't double post, my be it was in my
even for huge SQL-side max_connections.
If you are interested, I can continue work on SOLR-2233. CC: dev@lucene (is
anyone working on DIH improvements?)
Thanks,
Fuad Efendi
http://www.tokenizer.ca/
-Original Message-
From: François Schiettecatte [mailto:fschietteca...@gmail.com]
Sent: May
Anyone noticed that it doesn't work? Already 2 weeks
https://issues.apache.org/jira/browse/INFRA-3667
I don't receive WIKI change notifications. I CC to 'Apache Wiki'
wikidi...@apache.org
Something is bad.
-Fuad
It could be environment specific (specific of your top command
implementation, OS, etc)
I have on CentOS 2986m virtual memory showing although -Xmx2g
You have 10g virtual although -Xmx6g
Don't trust it too much... top command may count OS buffers for opened
files, network sockets, JVM DLLs
Interesting wordings:
we want real-time search, we want simple multi-tenancy, and we want a
solution that is built for the cloud
And later,
built on top of Lucene.
Is that possible? :)
(what does that mean real time search anyway... and what is cloud?)
community is growing!
P.S.
I never used
Nice article... 2 ms better than 20 ms, but in another chart 50 seconds are not
as good as 3 seconds... Sorry for my vision...
SOLR pushed into Lucene Core huge amount of performance improvements...
Sent on the TELUS Mobility network with BlackBerry
-Original Message-
From: Shashi Kant
Related: SOLR-846
Sent on the TELUS Mobility network with BlackBerry
-Original Message-
From: Erick Erickson erickerick...@gmail.com
Date: Tue, 7 Dec 2010 08:11:41
To: solr-user@lucene.apache.org
Reply-To: solr-user@lucene.apache.org
Subject: Re: Out of memory error
Have you seen this
Batch size -1??? Strange but could be a problem.
Note also you can't provide parameters to default startup.sh command; you
should modify setenv.sh instead
--Original Message--
From: sivaprasad
To: solr-user@lucene.apache.org
ReplyTo: solr-user@lucene.apache.org
Subject: Out of memory
I experienced similar problems. It was because we didn't perform load stress
tests properly, before going to production. Nothing is forever, replace
controller, change hardware vendor, maintain low temperature inside a rack.
Thanks
--Original Message--
From: Robert Gründler
To:
You could set a firewall that forbid any connection to your Solr's
server port to everyone, except the computer that host your application
that connect to Solr.
So, only your application will be able to connect to Solr.
I believe firewalling is the only possible solution since SOLR doesn't
For Making by solr admin password protected,
I had used the Path Based Authentication form
http://wiki.apache.org/solr/SolrSecurity.
In this way my admin area,search,delete,add to index is protected.But
Now
when I make solr authenticated then for every update/delete from the
fornt
end is
Hi,
I've read very interesting interview with Ryan,
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Podcasts-and
-Videos/Interview-Ryan-McKinley
Another finding is
https://issues.apache.org/jira/browse/SOLR-773
(lucene/contrib/spatial)
Is there any more staff going on for SOLR
Funny, Arrays.copy() for HashMap... but something similar...
Anyway, I use same values for initial size and max size, to be safe... and
to have OOP at startup :)
-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: February-12-10 6:55 PM
To: solr-user
or since you specificly asked about delteing anything older
then X days (in this example i'm assuming x=7)...
deletequerycreateTime:[NOW-7DAYS TO *]/query/delete
createTime:[* TO NOW-7DAYS]
hello *, quick question, what would i have to change in the query
parser to allow wildcarded terms to go through text analysis?
I believe it is illogical. wildcarded terms will go through terms
enumerator.
SOLR doesn't come with such things...
Look at www.liferay.com; they have plugin for SOLR (in SVN trunk) so that
all documents / assets can be automatically indexed by SOLR (and you have
full freedom with defining specific SOLR schema settings); their portlets
support WebDAV, and Open Office looks
-portlets, but I never tried).
Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay
-Original Message-
From: Peter [mailto:zarato...@gmx.net]
Sent: January-16-10 10:17 AM
To: solr-user@lucene.apache.org
Subject: Fundamental questions of how to build up solr for huge portals
'!'
:)))
Plus, FastLRUCache (previous one was synchronized)
(and of course warming-up time) := start complains after ensuring there are
no complains :)
(and of course OS needs time to cache filesystem blocks, and Java HotSpot,
... - few minutes at least...)
On Feb 3, 2010, at 1:38 PM, Rajat
Levenstein algo is currently hardcoded (FuzzyTermEnum class) in Lucene 2.9.1
and 3.0...
There are samples of other distance in contrib folder
If you want to play with distance, check
http://issues.apache.org/jira/browse/LUCENE-2230
It works if distance is integer and follows metric space axioms:
.
It may work well (but only if query contains term from dictionary; it can't
work as a spellchecker)
Combination 2 algos can boost performance extremely...
Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay
Tokenizer Inc.
http://www.tokenizer.ca/
Data Mining, Vertical Search
I can only tell that Liferay Portal (WebDAV) Document Library Portlet has
same functionality as Sharepoint (it has even /servlet/ URL with suffix
'/sharepoint'); Liferay also has plugin (web-hook) for SOLR (it has generic
search wrapper; any kind of search service provider can be hooked in
Why to embed indexing as a transaction dependency? Extremely weird
idea.
There is nothing weird about different use cases requiring different
approaches
If you're just thinking documents and text search ... then its less of
an issue.
If you have an online application where the
Even if commit takes 20 minutes?
I've never seen a commit take 20 minutes... (anything taking that long
is broken, perhaps in concept)
index merge can take from few minutes to few hours. That's why nothing can
beat SOLR Master/Slave and sharding for huge datasets. And reopening of
Is there limit on size of query string?
Looks like I have exceptions when query string is higher than 400 characters
(average)
Thanks!
, and field Canada (6 characters) in
another few; no any relational, it's done automatically without any
Compass/Hibernate/Table(s)
Don't think relational.
I wrote this 2 years ago:
http://www.theserverside.com/news/thread.tss?thread_id=50711#272351
Fuad Efendi
+1 416-993-2060
http
nothing.
Why to embed indexing as a transaction dependency? Extremely weird idea. But
I understand some selling points...
SOLR: it is faster than Lucene. Filtered queries run faster than traditional
AND queries! And this is real selling point.
Thanks,
Fuad Efendi
+1 416-993-2060
http
http://issues.apache.org/jira/browse/LUCENE-2230
Enjoy!
-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: January-19-10 11:32 PM
To: solr-user@lucene.apache.org
Subject: SOLR Performance Tuning: Fuzzy Searches, Distance, BK-Tree
Hi,
I am wondering: will SOLR
! (although I need to use classic int
instead of float distance by Lucene/Levenstein etc.)
Thanks,
Fuad Efendi
+1 416-993-2060
http://www.tokenizer.ca/
Data Mining, Vertical Search
-03-10 10:03 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR: Replication
On Sat, Jan 2, 2010 at 11:35 PM, Fuad Efendi f...@efendi.ca wrote:
I tried... I set APR to improve performance... server is slow while
replica;
but top shows only 1% of I/O wait... it is probably environment
I used RSYNC before, and 20Gb replica took less than an hour (20-40
minutes); now, HTTP, and it takes 5-6 hours...
Admin screen shows 952Kb/sec average speed; 100Mbps network, full-duplex; I
am using Tomcat Native for APR. 10x times slow...
-Fuad
http://www.tokenizer.ca
, Fuad Efendi f...@efendi.ca wrote:
I used RSYNC before, and 20Gb replica took less than an hour (20-40
minutes); now, HTTP, and it takes 5-6 hours...
Admin screen shows 952Kb/sec average speed; 100Mbps network, full-
duplex; I
am using Tomcat Native for APR. 10x times slow...
Hmmm, did you
, WIKIs, Forum
Posts) is automatically indexed. Having separate SOLR definitely helps:
instead of hardcoding (with Lucene) we can now intelligently manage stop
words, stemming, language settings, and more.
Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay
Tokenizer Inc.
http
OutOfMemoryException.
I use highlight, faceting on nontokenized Country field, standard handler.
It even seems to be a bug...
Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay
Tokenizer Inc.
http://www.tokenizer.ca/
Data Mining, Vertical Search
, Fuad Efendi wrote:
I used pagination for a while till found this...
I have filtered query ID:[* TO *] returning 20 millions results (no
faceting), and pagination always seemed to be fast. However, fast only
with
low values for start=12345. Queries like start=28838540 take 40-60
.
-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: December-24-09 11:37 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR Performance Tuning: Pagination
When do users do a query like that? --wunder
On Dec 24, 2009, at 8:09 AM, Fuad Efendi wrote
Well, SolrEntityProcessor users do :)
http://issues.apache.org/jira/browse/SOLR-1499
(which by the way I plan on polishing and committing over the holidays)
Erik
On Dec 24, 2009, at 8:09 AM, Fuad Efendi wrote:
I used pagination for a while till found this...
I have
to standard /logs folder of Tomcat.
You may find additional logging configuration settings by google for Java 5
Logging etc.
2009/12/20 Fuad Efendi f...@efendi.ca:
After researching how to configure default SOLR Tomcat logging, I
finally
disabled INFO-level for SOLR.
And performance
/Tomcat Logger slows down performance
much higher than read-only I/O of Lucene.
Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay
Tokenizer Inc.
http://www.tokenizer.ca/
Data Mining, Vertical Search
; itoLog.size(); i++) {
String name = toLog.getName(i);
Object val = toLog.getVal(i);
sb.append(name).append(=).append(val).append( );
}
log.info(logid + sb.toString());...
...
-Fuad
-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent
By that I mean that the java/tomcat
process just disappears.
I had similar problem when I started Tomcat via SSH, and then I improperly
closed SSH without exit command.
In some cases (OutOfMemory) memory is not enough to generate log (or CPU can
be overloaded by Garbage Collector to such
: Michael McCandless [mailto:luc...@mikemccandless.com]
Sent: November-03-09 5:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Lucene FieldCache memory requirements
On Mon, Nov 2, 2009 at 9:27 PM, Fuad Efendi f...@efendi.ca wrote:
I believe this is correct estimate:
C. [maxdoc] x [4 bytes
Any thoughts regarding the subject? I hope FieldCache doesn't use more than
6 bytes per document-field instance... I am too lazy to research Lucene
source code, I hope someone can provide exact answer... Thanks
Subject: Lucene FieldCache memory requirements
Hi,
Can anyone confirm Lucene
: Lucene FieldCache memory requirements
Which FieldCache API are you using? getStrings? or getStringIndex
(which is used, under the hood, if you sort by this field).
Mike
On Mon, Nov 2, 2009 at 2:27 PM, Fuad Efendi f...@efendi.ca wrote:
Any thoughts regarding the subject? I hope
, this is exceptionally wasteful. If
Lucene had simple bit-packed ints (I've opened LUCENE-1990 for this)
then it'd take much fewer bits to reference the values, since you have
only 10 unique string values.
Mike
On Mon, Nov 2, 2009 at 3:57 PM, Fuad Efendi f...@efendi.ca wrote:
I am
multi reader and just do the work to get the right
number (currently there is a comment that the user should do that work
if necessary, making the call unreliable for this).
Fuad Efendi wrote:
Thank you very much Mike,
I found it:
org.apache.solr.request.SimpleFacets
I just did some tests in a completely new index (Slave), sort by
low-distributed non-tokenized Field (such as Country) takes milliseconds,
but sort (ascending) on tokenized field with heavy distribution took 30
seconds (initially). Second sort (descending) took milliseconds. Generic
query *.*;
Mark,
I don't understand this:
so with a ton of docs and a few uniques, you get a temp boost in the RAM
reqs until it sizes it down.
Sizes down??? Why is it called Cache indeed? And how SOLR uses it if it is
not cache?
And this:
A pointer for each doc.
Why can't we use (int) DocumentID?
:
[512Mb ~ 1Gb] + [non_tokenized_fields_count] x [maxdoc] x [8 bytes]
-Fuad
-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: November-02-09 7:37 PM
To: solr-user@lucene.apache.org
Subject: RE: Lucene FieldCache memory requirements
Simple field (10 different values
;
}
};
The formula for a String Index fieldcache is essentially the String
array of unique terms (which does indeed size down at the bottom) and
the int array indexing into the String array.
Fuad Efendi wrote:
To be correct, I analyzed FieldCache awhile ago and I believed it never
Even in simplistic scenario, when it is Garbage Collected, we still
_need_to_be_able_ to allocate enough RAM to FieldCache on demand... linear
dependency on document count...
Hi Mark,
Yes, I understand it now; however, how will StringIndexCache size down in
a
production system faceting by
FieldCache uses internally WeakHashMap... nothing wrong, but... no any
Garbage Collection tuning will help in case if allocated RAM is not enough
for replacing Weak** with Strong**, especially for SOLR faceting... 10%-15%
CPU taken by GC were reported...
-Fuad
Hi,
Can anyone confirm Lucene FieldCache memory requirements? I have 100
millions docs with non-tokenized field country (10 different countries); I
expect it requires array of (int, long), size of array 100,000,000,
without any impact of country field length;
it requires 600,000,000 bytes: int
8 GB is much larger than is well supported. Its diminishing returns over
40-100 and mostly a waste of RAM. Too high and things can break. It
should be well below 2 GB at most, but I'd still recommend 40-100.
Fuad Efendi wrote:
Reason of having big RAM buffer is lowering frequency
Thanks for pointing to it, but it is so obvious:
1. Buffer is used as a RAM storage for index updates
2. int has 2 x Gb different values (2^^32)
3. We can have _up_to_ 2Gb of _Documents_ (stored as key-value pairs,
inverted index)
In case of 5 fields which I have, I need 5 arrays (up to 2Gb of
1 - 100 of 270 matches
Mail list logo