Re: Caused by: org.noggit.JSONParser$ParseException: Expected ',' or '}': char=",position=312 BEFORE='ssions"

2017-04-25 Thread Fuad Efendi
Yes, absolutely correct, comma is missing at the end of line 10 All key-value pairs inside the same block should be comma separated, except last one From: Shawn Heisey Reply: solr-user@lucene.apache.org

Re: CPU Intensive Scoring Alternatives

2017-02-21 Thread Fuad Efendi
and prefer closer to an OR for smaller > collections. > > -Doug > > On Tue, Feb 21, 2017 at 1:39 PM Fuad Efendi <f...@efendi.ca > wrote: > >> Thank you Ahmet, I will try it; sounds reasonable >> >> >> From: Ahmet Arslan <iori...@yahoo.com.invalid >

Re: CPU Intensive Scoring Alternatives

2017-02-21 Thread Fuad Efendi
t goes? Ahmet On Tuesday, February 21, 2017 4:28 AM, Fuad Efendi <f...@efendi.ca> wrote: Hello, Default TF-IDF performs poorly with the indexed 200 millions documents. Query "Michael Jackson" may run 300ms, and "Michael The Jackson" over 3 seconds. eDisMax. Because

CPU Intensive Scoring Alternatives

2017-02-20 Thread Fuad Efendi
chael Jackson” runs 300ms instead of 3ms just because huge number of hits and TF-IDF calculations. Solr 6.3. Thanks, -- Fuad Efendi (416) 993-2060 http://www.tokenizer.ca Search Relevancy, Recommender Systems

Re: Solr 5.5.0 MSSQL Datasource Example

2017-02-07 Thread Fuad Efendi
user pass dbname localhost 1433 -- Fuad Efendi (416) 993-2060 http://www.tokenizer.ca Search Relevancy, Recommender Systems From: Per Newgro <per.new...@gmx.ch> <per.new...@gmx.ch> Repl

Re: Solr 5.3.1: Collection reload results in IndexWriter is closed exception

2017-02-07 Thread Fuad Efendi
Were you indexing new documents while reloading? “Previously we’ve done reloads of a collection after changing solrconfig.xml without any issues.” -- Fuad Efendi (416) 993-2060 http://www.tokenizer.ca Search Relevancy, Recommender Systems From: Kelly, Frank <frank.ke...@here.com> <

Re: Help with design choice: join or multiValued field

2017-02-06 Thread Fuad Efendi
Correct: multivalued field with 1 shop IDs. Use case: shopping network in U.S. for example for a big brand such as Walmart, when user implicitly provides IP address or explicitly Postal Code, so that we can find items in his/her neighbourhood. You basically provide “join” information via

Re: Time of insert

2017-02-06 Thread Fuad Efendi
Not; historical logs for document updates is not provided. Users need to implement such functionality themselves if needed. From: Mahmoud Almokadem Reply: solr-user@lucene.apache.org

Re: How-To: Secure Solr by IP Address

2016-11-04 Thread Fuad Efendi
, it will simplify life ;) On November 4, 2016 at 12:05:13 PM, Fuad Efendi (f...@efendi.ca) wrote: Yes we need that documented, http://stackoverflow.com/questions/8924102/restricting-ip-addresses-for-jetty-and-solr Of course Firewall is a must for extremely strong environments / large corporations, DMZ

Re: How-To: Secure Solr by IP Address

2016-11-04 Thread Fuad Efendi
+ DMZ(s) -- Fuad Efendi (416) 993-2060 http://www.tokenizer.ca Search Relevancy, Recommender Systems On November 4, 2016 at 9:28:21 AM, David Smiley (david.w.smi...@gmail.com) wrote: I was just researching how to secure Solr by IP address and I finally figured it out. Perhaps this might go

Re: Different Sorts based on Different Groups

2016-11-04 Thread Fuad Efendi
ry different. I had recently assignment at well-known retail shop where we even designed pre-query custom boosts so that we can customize typical (most important for the business) queries as per business needs Thanks, -- Fuad Efendi (416) 993-2060 http://www.tokenizer.ca Search Relevancy, Recomm

Re: Problem with Password Decryption in Data Import Handler

2016-11-02 Thread Fuad Efendi
ed general connectivity/authentication problems. Thanks, Jamie On Wed, Nov 2, 2016 at 4:58 PM, Fuad Efendi <f...@efendi.ca> wrote: > In MySQL, this command will explicitly allow to connect from > remote ICZ2002912 host, check MySQL documentation: > > GRANT ALL ON my

Re: Problem with Password Decryption in Data Import Handler

2016-11-02 Thread Fuad Efendi
In MySQL, this command will explicitly allow to connect from remote ICZ2002912  host, check MySQL documentation: GRANT ALL ON mysite.* TO 'root’@'ICZ2002912' IDENTIFIED BY ‘Oakton123’; On November 2, 2016 at 4:41:48 PM, Fuad Efendi (f...@efendi.ca) wrote: This is the root of the problem

Re: Problem with Password Decryption in Data Import Handler

2016-11-02 Thread Fuad Efendi
you need to allow MySQL & Co. to accept connections from ICZ2002912. Plus, check DNS resolution, etc.  Thanks, -- Fuad Efendi (416) 993-2060 http://www.tokenizer.ca Recommender Systems On November 2, 2016 at 2:37:08 PM, Jamie Jackson (jamieja...@gmail.com) wrote: I'm at a brick wall. Here

Re: Timeout occured while waiting response from server at: http://***/solr/commodityReview

2016-11-02 Thread Fuad Efendi
sider sharding / SolrCloud if you need huge memory just for field cache. And you will be forced to consider it if you gave more that 2 billions documents (am I right? Lucene internal limitation, Integer.MAX_INT) Thanks, -- Fuad Efendi (416) 993-2060 http://www.tokenizer.ca Search Rele

Re: Timeout occured while waiting response from server at: http://***/solr/commodityReview

2016-11-01 Thread Fuad Efendi
internal caches. Solr has the way to warm up internal caches before making new searcher available: https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig Make this queries typical for your use cases (for instance, *:* with faceting):          Thanks, -- Fuad Efendi (416

Foot, Inch: Stripping Out Special Characters: DisMax: WhitespaceTokenizer vs. Keyword Tokenizer

2016-03-10 Thread Fuad Efendi
. But it works fine with KeywordTokenizer. Any idea why? Thanks, --  Fuad Efendi http://www.tokenizer.ca Data Mining, Vertical Search

Re: Stopping Solr JVM on OOM

2016-02-25 Thread Fuad Efendi
;what is the best way to stop Solr when it gets in OOM” (or just becomes irresponsive because of swallowed exceptions) --  Fuad Efendi 416-993-2060(cell) On February 25, 2016 at 2:37:45 PM, CP Mishra (mishr...@gmail.com) wrote: Looking at the previous threads (and in our tests), oom script spec

RE: Solr HTTP client authentication

2014-11-17 Thread Fuad Efendi
I can manually create an httpclient and set up authentication but then I can't use solrj. Yes; correct; except that you _can_ use solj with this custom HttpClient instance (which will intercept authentication, which will support cookies, SSL or plain HTTP, Keep-Alive, and etc.) You can

Please add me: FuadEfendi

2013-04-05 Thread Fuad Efendi
Hi, Few months ago I was able to modify Wiki; I can't do it now, probably because http://wiki.apache.org/solr/ContributorsGroup Please add me: FuadEfendi Thanks! -- Fuad Efendi, PhD, CEO C: (416)993-2060 F: (416)800-6479 Tokenizer Inc., Canada http://www.tokenizer.ca

contributor group

2013-04-05 Thread Fuad Efendi
Hi, Please add me: FuadEfendi Thanks! -- http://www.tokenizer.ca

RE: Can SOLR Index UTF-16 Text

2012-10-03 Thread Fuad Efendi
... -Fuad Efendi http://www.tokenizer.ca -Original Message- From: vybe3142 [mailto:vybe3...@gmail.com] Sent: October-03-12 12:30 PM To: solr-user@lucene.apache.org Subject: Re: Can SOLR Index UTF-16 Text Thanks for all the responses. Problem partially solved (see below) 1. In a sense, my

RE: Can SOLR Index UTF-16 Text

2012-10-03 Thread Fuad Efendi
your file to Solr) -Fuad Efendi http://www.tokenizer.ca -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: October-03-12 1:30 PM To: solr-user@lucene.apache.org Subject: RE: Can SOLR Index UTF-16 Text Something is missing from the body of your Email... As I pointed

RE: Can SOLR Index UTF-16 Text

2012-10-02 Thread Fuad Efendi
Solr can index bytearrays too: unigram, bigram, trigram... even bitsets, tritsets, qatrisets ;- ) LOL I got strong cold... BTW, don't forget to configure UTF-8 as your default (Java) container encoding... -Fuad

Re: UnInvertedField limitations

2012-09-06 Thread Fuad Efendi
have such large documents? This appears to be a hard limit based of 24-bytes in a Java int. You can try facet.method=enum, but that may be too slow. What release of Solr are you running? -- Jack Krupansky -Original Message- From: Fuad Efendi Sent: Monday, August 20, 2012 4:34 PM To: Solr

Re: UnInvertedField limitations

2012-09-06 Thread Fuad Efendi
of 24-bytes in a Java int. You can try facet.method=enum, but that may be too slow. What release of Solr are you running? -- Jack Krupansky -Original Message- From: Fuad Efendi Sent: Monday, August 20, 2012 4:34 PM To: Solr-User@lucene.apache.org Subject: UnInvertedField

RE: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser

2012-08-25 Thread Fuad Efendi
for a specific term MyTerm, and when I execute query channel:MyTerm it shows 650 documents foundŠ possibly bugŠ it happens after I commit data too, nothing changes; and this field is single-valued non-tokenized string. -Fuad -- Fuad Efendi 416-993-2060 http://www.tokenizer.ca

Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser

2012-08-24 Thread Fuad Efendi
Hi there, Load term Info shows 3650 for a specific term MyTerm, and when I execute query channel:MyTerm it shows 650 documents foundŠ possibly bugŠ it happens after I commit data too, nothing changes; and this field is single-valued non-tokenized string. -Fuad -- Fuad Efendi 416-993-2060 http

RE: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser

2012-08-24 Thread Fuad Efendi
too, nothing changes; and this field is single-valued non-tokenized string. -Fuad -- Fuad Efendi 416-993-2060 http://www.tokenizer.ca

Re: Near Real Time + Facets + Hierarchical Faceting (Pivot Table) with Date Range: huge data set

2012-08-20 Thread Fuad Efendi
://solr-ra.tgels.org Regards - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org ps. Note: Apache Solr 4.0 with RankingAlgorithm 1.4.4 is an external implementation On 8/13/2012 11:38 AM, Fuad Efendi wrote: SOLR-4.0 I am trying to implement this; funny idea

UnInvertedField limitations

2012-08-20 Thread Fuad Efendi
(SolrCore.java:1561) -- Fuad Efendi http://www.tokenizer.ca

UnInvertedField limitations

2012-08-20 Thread Fuad Efendi
) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand ler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) -- Fuad Efendi http

Near Real Time + Facets + Hierarchical Faceting (Pivot Table) with Date Range: huge data set

2012-08-13 Thread Fuad Efendi
will accumulate search results from three layers, it will be near real time. Any thoughts? Thanks, -- Fuad Efendi 416-993-2060 Tokenizer Inc., Canada http://www.tokenizer.ca http://www.linkedin.com/in/lucene

RE: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-17 Thread Fuad Efendi
FWIW, when asked at what point one would want to split JVMs and shard, on the same machine, Grant Ingersoll mentioned 16GB, and precisely for GC cost reasons. You're way above that. - his index is 75G, and Grant mentioned RAM heap size; we can use terabytes of index with 16Gb memory.

Solr Consultant Available in Canada: Solr, HBase, Hadoop, Mahout, Lily

2012-04-16 Thread Fuad Efendi
, Web Services, Moreover, Web Ping, SQL-import, sitemaps-based, intranets, and more. Additionally to that, I can design super-rich UI extremely fast using tools such as Liferay Portal, Apache Wicket, Vaadin. Thanks, -- Fuad Efendi 416-993-2060 Tokenizer Inc., Canada http://www.tokenizer.ca http

Solr Consultant Available in Canada: Solr, HBase, Hadoop, Lily

2012-04-16 Thread Fuad Efendi
, Web Services, Moreover, Web Ping, SQL-import, sitemaps-based, intranets, and more. Additionally to that, I can design super-rich UI extremely fast using tools such as Liferay Portal, Apache Wicket, Vaadin. Thanks, -- Fuad Efendi 416-993-2060 Tokenizer Inc., Canada http://www.tokenizer.ca http

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Fuad Efendi
I agree that SSD boosts performance... In some rare not-real-life scenario: - super frequent commits That's it, nothing more except the fact that Lucene compile time including tests takes up to two minutes on MacBook with SSD, or forty-fifty minutes on Windows with HDD. Of course, with non-empty

Re: jetty error, broken pipe

2011-11-19 Thread Fuad Efendi
It's not Jetty. It is broken TCP pipe due to client-side. It happens when client closes TCP connection. And I even had this problem with recent Tomcat 6. Problem disappeared after I explicitly tuned keep-alive at Tomcat, and started using monitoring thread with HttpClient and SOLRJ... Fuad

Re: HBase Datasource

2011-11-10 Thread Fuad Efendi
I am using Lily for atomic index updates ( implemented very nice; transactionally; plus MapReduce; plus auto-denormaluzing) http://www.lilyproject.org It slows down mean time 7-10 times, but TPS still the same - Fuad http://www.tokenizer.ca Sent from my iPad On 2011-11-10, at 9:59 PM,

Re: solr keeps dying every few hours.

2011-08-17 Thread Fuad Efendi
I use -Xms3072M . Large CPU instance is virtualization and behaviour is unpredictable. Choose cluster instance with explicit Intel XEON CPU (instead of CPU-Units) and compare behaviour; $1.60/hour. Please share results. Thanks, -- Fuad Efendi 416-993-2060 Tokenizer Inc., Canada Data

Re: solr keeps dying every few hours.

2011-08-17 Thread Fuad Efendi
I agree with Yonik of course; ButŠ You should see OOM errors in this case. In case of virtualization however it is unpredictableŠ and if JVM doesn't have few bytes to output OOM into log file (because we are catching throwable and trying to generate HTTP 500 instead !!! FreakyŠ)

Re: solr keeps dying every few hours.

2011-08-17 Thread Fuad Efendi
fields per instanceŠ they don't have any problem outside Amazon ;))) -- Fuad Efendi 416-993-2060 Tokenizer Inc., Canada Data Mining, Search Engines http://www.tokenizer.ca On 11-08-17 11:08 PM, Fuad Efendi f...@efendi.ca wrote: more investigation and I see that I have 100+ dynamic fields

Solr Performance Tuning: -XX:+AggressiveOpts

2011-07-27 Thread Fuad Efendi
(which has to be default setting in upcoming releases Java 6) Do we need to disable -XX:-DoEscapeAnalysis as IBM suggests? http://www-01.ibm.com/support/docview.wss?uid=swg21422605 Thanks, Fuad Efendi http://www.tokenizer.ca

Re: Solr Performance Tuning: -XX:+AggressiveOpts

2011-07-27 Thread Fuad Efendi
, Fuad Efendi f...@efendi.ca wrote: Anyone tried this? I can not start Solr-Tomcat with following options on Ubuntu: JAVA_OPTS=$JAVA_OPTS -Xms2048m -Xmx2048m -Xmn256m -XX:MaxPermSize=256m JAVA_OPTS=$JAVA_OPTS -Dsolr.solr.home=/data/solr -Dfile.encoding=UTF8 -Duser.timezone=GMT

Re: 400 MB Fields

2011-06-07 Thread Fuad Efendi
I think the question is strange... May be you are wondering about possible OOM exceptions? I think we can pass to Lucene single document containing comma separated list of term, term, ... (few billion times)... Except stored and TermVectorComponent... I believe thousands companies already indexed

Re: 400 MB Fields

2011-06-07 Thread Fuad Efendi
Hi Otis, I am recalling pagination feature, it is still unresolved (with default scoring implementation): even with small documents, searching-retrieving documents 1 to 10 can take 0 milliseconds, but from 100,000 to 100,010 can take few minutes (I saw it with trunk version 6 months ago, and

Re: URGENT HELP: Improving Solr indexing time

2011-06-04 Thread Fuad Efendi
WHERE KEY2=? ORDER BY KEY1 - check everything... Thanks, -- Fuad Efendi 416-993-2060 Tokenizer Inc., Canada Data Mining, Search Engines http://www.tokenizer.ca http://www.tokenizer.ca/ On 11-06-05 12:09 AM, Rohit Gupta ro...@in-rev.com wrote: No didn't double post, my be it was in my

RE: DIH: Exception with Too many connections

2011-05-31 Thread Fuad Efendi
even for huge SQL-side max_connections. If you are interested, I can continue work on SOLR-2233. CC: dev@lucene (is anyone working on DIH improvements?) Thanks, Fuad Efendi http://www.tokenizer.ca/ -Original Message- From: François Schiettecatte [mailto:fschietteca...@gmail.com] Sent: May

WIKI alerts

2011-05-31 Thread Fuad Efendi
Anyone noticed that it doesn't work? Already 2 weeks https://issues.apache.org/jira/browse/INFRA-3667 I don't receive WIKI change notifications. I CC to 'Apache Wiki' wikidi...@apache.org Something is bad. -Fuad

RE: Solr memory consumption

2011-05-31 Thread Fuad Efendi
It could be environment specific (specific of your top command implementation, OS, etc) I have on CentOS 2986m virtual memory showing although -Xmx2g You have 10g virtual although -Xmx6g Don't trust it too much... top command may count OS buffers for opened files, network sockets, JVM DLLs

RE: Solr vs ElasticSearch

2011-05-31 Thread Fuad Efendi
Interesting wordings: we want real-time search, we want simple multi-tenancy, and we want a solution that is built for the cloud And later, built on top of Lucene. Is that possible? :) (what does that mean real time search anyway... and what is cloud?) community is growing! P.S. I never used

Re: Solr vs ElasticSearch

2011-05-31 Thread Fuad Efendi
Nice article... 2 ms better than 20 ms, but in another chart 50 seconds are not as good as 3 seconds... Sorry for my vision... SOLR pushed into Lucene Core huge amount of performance improvements... Sent on the TELUS Mobility network with BlackBerry -Original Message- From: Shashi Kant

Re: Out of memory error

2010-12-07 Thread Fuad Efendi
Related: SOLR-846 Sent on the TELUS Mobility network with BlackBerry -Original Message- From: Erick Erickson erickerick...@gmail.com Date: Tue, 7 Dec 2010 08:11:41 To: solr-user@lucene.apache.org Reply-To: solr-user@lucene.apache.org Subject: Re: Out of memory error Have you seen this

Re: Out of memory error

2010-12-06 Thread Fuad Efendi
Batch size -1??? Strange but could be a problem. Note also you can't provide parameters to default startup.sh command; you should modify setenv.sh instead --Original Message-- From: sivaprasad To: solr-user@lucene.apache.org ReplyTo: solr-user@lucene.apache.org Subject: Out of memory

Re: Dataimporthandler crashed raidcontroller

2010-11-04 Thread Fuad Efendi
I experienced similar problems. It was because we didn't perform load stress tests properly, before going to production. Nothing is forever, replace controller, change hardware vendor, maintain low temperature inside a rack. Thanks --Original Message-- From: Robert Gründler To:

RE: Need feedback on solr security

2010-02-17 Thread Fuad Efendi
You could set a firewall that forbid any connection to your Solr's server port to everyone, except the computer that host your application that connect to Solr. So, only your application will be able to connect to Solr. I believe firewalling is the only possible solution since SOLR doesn't

RE: Need feedback on solr security

2010-02-17 Thread Fuad Efendi
For Making by solr admin password protected, I had used the Path Based Authentication form http://wiki.apache.org/solr/SolrSecurity. In this way my admin area,search,delete,add to index is protected.But Now when I make solr authenticated then for every update/delete from the fornt end is

Range Queries, Geospatial

2010-02-16 Thread Fuad Efendi
Hi, I've read very interesting interview with Ryan, http://www.lucidimagination.com/Community/Hear-from-the-Experts/Podcasts-and -Videos/Interview-Ryan-McKinley Another finding is https://issues.apache.org/jira/browse/SOLR-773 (lucene/contrib/spatial) Is there any more staff going on for SOLR

RE: For caches, any reason to not set initialSize and size to the same value?

2010-02-12 Thread Fuad Efendi
Funny, Arrays.copy() for HashMap... but something similar... Anyway, I use same values for initial size and max size, to be safe... and to have OOP at startup :) -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: February-12-10 6:55 PM To: solr-user

RE: expire/delete documents

2010-02-12 Thread Fuad Efendi
or since you specificly asked about delteing anything older then X days (in this example i'm assuming x=7)... deletequerycreateTime:[NOW-7DAYS TO *]/query/delete createTime:[* TO NOW-7DAYS]

RE: analysing wild carded terms

2010-02-10 Thread Fuad Efendi
hello *, quick question, what would i have to change in the query parser to allow wildcarded terms to go through text analysis? I believe it is illogical. wildcarded terms will go through terms enumerator.

RE: Solr integration with document management systems

2010-02-06 Thread Fuad Efendi
SOLR doesn't come with such things... Look at www.liferay.com; they have plugin for SOLR (in SVN trunk) so that all documents / assets can be automatically indexed by SOLR (and you have full freedom with defining specific SOLR schema settings); their portlets support WebDAV, and Open Office looks

RE: Fundamental questions of how to build up solr for huge portals

2010-02-05 Thread Fuad Efendi
-portlets, but I never tried). Fuad Efendi +1 416-993-2060 http://www.linkedin.com/in/liferay -Original Message- From: Peter [mailto:zarato...@gmx.net] Sent: January-16-10 10:17 AM To: solr-user@lucene.apache.org Subject: Fundamental questions of how to build up solr for huge portals

RE: Solr response extremely slow

2010-02-04 Thread Fuad Efendi
'!' :))) Plus, FastLRUCache (previous one was synchronized) (and of course warming-up time) := start complains after ensuring there are no complains :) (and of course OS needs time to cache filesystem blocks, and Java HotSpot, ... - few minutes at least...) On Feb 3, 2010, at 1:38 PM, Rajat

RE: fuzzy matching / configurable distance function?

2010-02-04 Thread Fuad Efendi
Levenstein algo is currently hardcoded (FuzzyTermEnum class) in Lucene 2.9.1 and 3.0... There are samples of other distance in contrib folder If you want to play with distance, check http://issues.apache.org/jira/browse/LUCENE-2230 It works if distance is integer and follows metric space axioms:

SOLR Performance Tuning: Fuzzy Search

2010-02-03 Thread Fuad Efendi
. It may work well (but only if query contains term from dictionary; it can't work as a spellchecker) Combination 2 algos can boost performance extremely... Fuad Efendi +1 416-993-2060 http://www.linkedin.com/in/liferay Tokenizer Inc. http://www.tokenizer.ca/ Data Mining, Vertical Search

RE: Comparison of Solr with Sharepoint Search

2010-01-26 Thread Fuad Efendi
I can only tell that Liferay Portal (WebDAV) Document Library Portlet has same functionality as Sharepoint (it has even /servlet/ URL with suffix '/sharepoint'); Liferay also has plugin (web-hook) for SOLR (it has generic search wrapper; any kind of search service provider can be hooked in

RE: Solr vs. Compass

2010-01-25 Thread Fuad Efendi
Why to embed indexing as a transaction dependency? Extremely weird idea. There is nothing weird about different use cases requiring different approaches If you're just thinking documents and text search ... then its less of an issue. If you have an online application where the

RE: Solr vs. Compass

2010-01-25 Thread Fuad Efendi
Even if commit takes 20 minutes? I've never seen a commit take 20 minutes... (anything taking that long is broken, perhaps in concept) index merge can take from few minutes to few hours. That's why nothing can beat SOLR Master/Slave and sharding for huge datasets. And reopening of

Is there limit on size of query string?

2010-01-22 Thread Fuad Efendi
Is there limit on size of query string? Looks like I have exceptions when query string is higher than 400 characters (average) Thanks!

RE: Solr vs. Compass

2010-01-22 Thread Fuad Efendi
, and field Canada (6 characters) in another few; no any relational, it's done automatically without any Compass/Hibernate/Table(s) Don't think relational. I wrote this 2 years ago: http://www.theserverside.com/news/thread.tss?thread_id=50711#272351 Fuad Efendi +1 416-993-2060 http

RE: Solr vs. Compass

2010-01-22 Thread Fuad Efendi
nothing. Why to embed indexing as a transaction dependency? Extremely weird idea. But I understand some selling points... SOLR: it is faster than Lucene. Filtered queries run faster than traditional AND queries! And this is real selling point. Thanks, Fuad Efendi +1 416-993-2060 http

RE: SOLR Performance Tuning: Fuzzy Searches, Distance, BK-Tree

2010-01-22 Thread Fuad Efendi
http://issues.apache.org/jira/browse/LUCENE-2230 Enjoy! -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: January-19-10 11:32 PM To: solr-user@lucene.apache.org Subject: SOLR Performance Tuning: Fuzzy Searches, Distance, BK-Tree Hi, I am wondering: will SOLR

SOLR Performance Tuning: Fuzzy Searches, Distance, BK-Tree

2010-01-19 Thread Fuad Efendi
! (although I need to use classic int instead of float distance by Lucene/Levenstein etc.) Thanks, Fuad Efendi +1 416-993-2060 http://www.tokenizer.ca/ Data Mining, Vertical Search

RE: SOLR: Replication

2010-01-03 Thread Fuad Efendi
-03-10 10:03 AM To: solr-user@lucene.apache.org Subject: Re: SOLR: Replication On Sat, Jan 2, 2010 at 11:35 PM, Fuad Efendi f...@efendi.ca wrote: I tried... I set APR to improve performance... server is slow while replica; but top shows only 1% of I/O wait... it is probably environment

SOLR: Replication

2010-01-02 Thread Fuad Efendi
I used RSYNC before, and 20Gb replica took less than an hour (20-40 minutes); now, HTTP, and it takes 5-6 hours... Admin screen shows 952Kb/sec average speed; 100Mbps network, full-duplex; I am using Tomcat Native for APR. 10x times slow... -Fuad http://www.tokenizer.ca

RE: SOLR: Replication

2010-01-02 Thread Fuad Efendi
, Fuad Efendi f...@efendi.ca wrote: I used RSYNC before, and 20Gb replica took less than an hour (20-40 minutes); now, HTTP, and it takes 5-6 hours... Admin screen shows 952Kb/sec average speed; 100Mbps network, full- duplex; I am using Tomcat Native for APR. 10x times slow... Hmmm, did you

SOLR: Portlet (Plugin) for Lifeay Portal

2009-12-25 Thread Fuad Efendi
, WIKIs, Forum Posts) is automatically indexed. Having separate SOLR definitely helps: instead of hardcoding (with Lucene) we can now intelligently manage stop words, stemming, language settings, and more. Fuad Efendi +1 416-993-2060 http://www.linkedin.com/in/liferay Tokenizer Inc. http

SOLR Performance Tuning: Pagination

2009-12-24 Thread Fuad Efendi
OutOfMemoryException. I use highlight, faceting on nontokenized Country field, standard handler. It even seems to be a bug... Fuad Efendi +1 416-993-2060 http://www.linkedin.com/in/liferay Tokenizer Inc. http://www.tokenizer.ca/ Data Mining, Vertical Search

RE: SOLR Performance Tuning: Pagination

2009-12-24 Thread Fuad Efendi
, Fuad Efendi wrote: I used pagination for a while till found this... I have filtered query ID:[* TO *] returning 20 millions results (no faceting), and pagination always seemed to be fast. However, fast only with low values for start=12345. Queries like start=28838540 take 40-60

RE: SOLR Performance Tuning: Pagination

2009-12-24 Thread Fuad Efendi
. -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: December-24-09 11:37 AM To: solr-user@lucene.apache.org Subject: Re: SOLR Performance Tuning: Pagination When do users do a query like that? --wunder On Dec 24, 2009, at 8:09 AM, Fuad Efendi wrote

RE: SOLR Performance Tuning: Pagination

2009-12-24 Thread Fuad Efendi
Well, SolrEntityProcessor users do :) http://issues.apache.org/jira/browse/SOLR-1499 (which by the way I plan on polishing and committing over the holidays) Erik On Dec 24, 2009, at 8:09 AM, Fuad Efendi wrote: I used pagination for a while till found this... I have

RE: SOLR Performance Tuning: Disable INFO Logging.

2009-12-21 Thread Fuad Efendi
to standard /logs folder of Tomcat. You may find additional logging configuration settings by google for Java 5 Logging etc. 2009/12/20 Fuad Efendi f...@efendi.ca: After researching how to configure default SOLR Tomcat logging, I finally disabled INFO-level for SOLR. And performance

SOLR Performance Tuning: Disable INFO Logging.

2009-12-20 Thread Fuad Efendi
/Tomcat Logger slows down performance much higher than read-only I/O of Lucene. Fuad Efendi +1 416-993-2060 http://www.linkedin.com/in/liferay Tokenizer Inc. http://www.tokenizer.ca/ Data Mining, Vertical Search

RE: SOLR Performance Tuning: Disable INFO Logging.

2009-12-20 Thread Fuad Efendi
; itoLog.size(); i++) { String name = toLog.getName(i); Object val = toLog.getVal(i); sb.append(name).append(=).append(val).append( ); } log.info(logid + sb.toString());... ... -Fuad -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent

RE: solr stops running periodically

2009-11-16 Thread Fuad Efendi
By that I mean that the java/tomcat process just disappears. I had similar problem when I started Tomcat via SSH, and then I improperly closed SSH without exit command. In some cases (OutOfMemory) memory is not enough to generate log (or CPU can be overloaded by Garbage Collector to such

RE: Lucene FieldCache memory requirements

2009-11-03 Thread Fuad Efendi
: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: November-03-09 5:00 AM To: solr-user@lucene.apache.org Subject: Re: Lucene FieldCache memory requirements On Mon, Nov 2, 2009 at 9:27 PM, Fuad Efendi f...@efendi.ca wrote: I believe this is correct estimate: C. [maxdoc] x [4 bytes

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
Any thoughts regarding the subject? I hope FieldCache doesn't use more than 6 bytes per document-field instance... I am too lazy to research Lucene source code, I hope someone can provide exact answer... Thanks Subject: Lucene FieldCache memory requirements Hi, Can anyone confirm Lucene

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
: Lucene FieldCache memory requirements Which FieldCache API are you using? getStrings? or getStringIndex (which is used, under the hood, if you sort by this field). Mike On Mon, Nov 2, 2009 at 2:27 PM, Fuad Efendi f...@efendi.ca wrote: Any thoughts regarding the subject? I hope

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
, this is exceptionally wasteful. If Lucene had simple bit-packed ints (I've opened LUCENE-1990 for this) then it'd take much fewer bits to reference the values, since you have only 10 unique string values. Mike On Mon, Nov 2, 2009 at 3:57 PM, Fuad Efendi f...@efendi.ca wrote: I am

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
multi reader and just do the work to get the right number (currently there is a comment that the user should do that work if necessary, making the call unreliable for this). Fuad Efendi wrote: Thank you very much Mike, I found it: org.apache.solr.request.SimpleFacets

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
I just did some tests in a completely new index (Slave), sort by low-distributed non-tokenized Field (such as Country) takes milliseconds, but sort (ascending) on tokenized field with heavy distribution took 30 seconds (initially). Second sort (descending) took milliseconds. Generic query *.*;

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
Mark, I don't understand this: so with a ton of docs and a few uniques, you get a temp boost in the RAM reqs until it sizes it down. Sizes down??? Why is it called Cache indeed? And how SOLR uses it if it is not cache? And this: A pointer for each doc. Why can't we use (int) DocumentID?

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
: [512Mb ~ 1Gb] + [non_tokenized_fields_count] x [maxdoc] x [8 bytes] -Fuad -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: November-02-09 7:37 PM To: solr-user@lucene.apache.org Subject: RE: Lucene FieldCache memory requirements Simple field (10 different values

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
; } }; The formula for a String Index fieldcache is essentially the String array of unique terms (which does indeed size down at the bottom) and the int array indexing into the String array. Fuad Efendi wrote: To be correct, I analyzed FieldCache awhile ago and I believed it never

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
Even in simplistic scenario, when it is Garbage Collected, we still _need_to_be_able_ to allocate enough RAM to FieldCache on demand... linear dependency on document count... Hi Mark, Yes, I understand it now; however, how will StringIndexCache size down in a production system faceting by

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
FieldCache uses internally WeakHashMap... nothing wrong, but... no any Garbage Collection tuning will help in case if allocated RAM is not enough for replacing Weak** with Strong**, especially for SOLR faceting... 10%-15% CPU taken by GC were reported... -Fuad

Lucene FieldCache memory requirements

2009-10-30 Thread Fuad Efendi
Hi, Can anyone confirm Lucene FieldCache memory requirements? I have 100 millions docs with non-tokenized field country (10 different countries); I expect it requires array of (int, long), size of array 100,000,000, without any impact of country field length; it requires 600,000,000 bytes: int

RE: Too many open files

2009-10-24 Thread Fuad Efendi
8 GB is much larger than is well supported. Its diminishing returns over 40-100 and mostly a waste of RAM. Too high and things can break. It should be well below 2 GB at most, but I'd still recommend 40-100. Fuad Efendi wrote: Reason of having big RAM buffer is lowering frequency

RE: Too many open files

2009-10-24 Thread Fuad Efendi
Thanks for pointing to it, but it is so obvious: 1. Buffer is used as a RAM storage for index updates 2. int has 2 x Gb different values (2^^32) 3. We can have _up_to_ 2Gb of _Documents_ (stored as key-value pairs, inverted index) In case of 5 fields which I have, I need 5 arrays (up to 2Gb of

  1   2   3   >