How to avoid the unexpected character error?

2012-03-14 Thread neosky
I use the xml to index the data. One filed might contains some characters like '' = It seems that will produce the error I modify that filed doesn't index, but it doesn't work. I need to store the filed, but index might not be indexed. Thanks! -- View this message in context:

Too many open files - lots of sockets

2012-03-14 Thread Colin Howe
Hello, We keep hitting the too many open files exception. Looking at lsof we have a lot (several thousand) of entries like this: java 19339 root 1619u sock0,7 0t0 682291383 can't identify protocol However, netstat -a doesn't show any of these. Can

Re: Too many open files - lots of sockets

2012-03-14 Thread Markus Jelsma
Are you running trunk and have auto-commit enabled? Then disable auto-commit. Even if you increase ulimits it will continue to swallow all available file descriptors. On Wed, 14 Mar 2012 10:13:55 +, Colin Howe co...@conversocial.com wrote: Hello, We keep hitting the too many open files

Re: Too many open files - lots of sockets

2012-03-14 Thread Colin Howe
Currently using 3.4.0. We have autocommit enabled but we manually do commits every 100 documents anyway... I can turn it off if you think that might help. Cheers, Colin On Wed, Mar 14, 2012 at 10:24 AM, Markus Jelsma markus.jel...@openindex.iowrote: Are you running trunk and have auto-commit

Re: Too many open files - lots of sockets

2012-03-14 Thread Michael Kuhlmann
I had the same problem, without auto-commit. I never really found out what exactly the reason was, but I think it was because commits were triggered before a previous commit had the chance to finish. We now commit after every minute or 1000 (quite large) documents, whatever comes first. And

Sorting on non-stored field

2012-03-14 Thread Finotti Simone
I was wondering: is it possible to sort a Solr result-set on a non-stored value? Thank you

Re: Sorting on non-stored field

2012-03-14 Thread Michael Kuhlmann
Am 14.03.2012 11:43, schrieb Finotti Simone: I was wondering: is it possible to sort a Solr result-set on a non-stored value? Yes, it is. It must be indexed, indeed. -Kuli

Re: Sorting on non-stored field

2012-03-14 Thread Li Li
it should be indexed by not analyzed. it don't need stored. reading field values from stored fields is extremely slow. So lucene will use StringIndex to read fields for sort. so if you want to sort by some field, you should index this field and don't analyze it. On Wed, Mar 14, 2012 at 6:43 PM,

Re: How to avoid the unexpected character error?

2012-03-14 Thread Li Li
There is a class org.apache.solr.common.util.XML in solr you can use this wrapper: public static String escapeXml(String s) throws IOException{ StringWriter sw=new StringWriter(); XML.escapeCharData(s, sw); return sw.getBuffer().toString(); } On Wed, Mar 14, 2012

Re: Too many open files - lots of sockets

2012-03-14 Thread Colin Howe
After some more digging around I discovered that there was a bug reported in jetty 6: https://jira.codehaus.org/browse/JETTY-1458 This prompted me to upgrade to Jetty 7 and things look a bit more stable now :) On Wed, Mar 14, 2012 at 10:26 AM, Michael Kuhlmann k...@solarier.de wrote: I had

Re: Trouble indexing word documents

2012-03-14 Thread Tomás Fernández Löbbe
Well, this is another error. Looks like you are using cores and you are not adding the core name to the URL. Make sure you do it: http://localhost:8585/solr/[CORENAME]/update/extract?literal.id=1commit=true The core name is the one you defined in solr.xml and should always be used in the URL. If

Re: Too many open files - lots of sockets

2012-03-14 Thread Michael Kuhlmann
Ah, good to know! Thank you! I already had Jetty under suspicion, but we had this failure quite often in October and November, when the bug was not yet reported. -Kuli Am 14.03.2012 12:08, schrieb Colin Howe: After some more digging around I discovered that there was a bug reported in jetty

Dynamically changing facet hierarchies and facet values

2012-03-14 Thread Sphene Software
Hello, I have a use case where the facet hierarchies as well as facet names change very frequently. For example: (Smartphones Android ) may become Smartphones GSM And roid. OR Smartphone could be renamed to Smart Phone If I use traditional hierarchical faceting, then every

RE: solr 3.5 and indexing performance

2012-03-14 Thread Agnieszka Kukałowicz
Bug ticket created: https://issues.apache.org/jira/browse/SOLR-3245 I also made test you ask with english dictionary. The results are in the ticket. Agnieszka -Original Message- From: Jan Høydahl [mailto:jan@cominvent.com] Sent: Wednesday, March 14, 2012 12:54 AM To:

Re: Too many open files - lots of sockets

2012-03-14 Thread Erick Erickson
Colin: FYI, you might consider just setting up the autocommit (or commitWithin if you're using SolrJ) for some reasonable interval (I often use 10 minutes or so). Even though you've figured it is a Tomcat issue, each commit causes searcher re-opens, perhaps replication in a master/slave setup,

read only slaves and write only master

2012-03-14 Thread Mike Austin
Is there a way mark a master as write only and the slaves as read only? I guess I could just remove those handlers from the config? Is there a benefit from doing this as far as performance or anything else? Thanks, Mike

Re: SolrCloud Force replication without restarting

2012-03-14 Thread Mark Miller
On Mar 14, 2012, at 12:03 PM, Jamie Johnson wrote: Is there a way to force Solr to do a replication without restarting when in SolrCloud? You mean force a recovery? If so, then yes: there is a CoreAdminCommand (http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler) called REQUESTRECOVERY.

Re: How to avoid the unexpected character error?

2012-03-14 Thread neosky
Thanks! Does the schema.xml support this parameter? I am using the example post.jar to index my file. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3825959.html Sent from the Solr - User mailing list archive at

Re: SolrCloud Force replication without restarting

2012-03-14 Thread Jamie Johnson
Great, so to be clear I would execute the following correct? http://localhost:8983/solr/admin/cores?action=REQUESTRECOVERYcore=slice1_shard2 On Wed, Mar 14, 2012 at 12:18 PM, Mark Miller markrmil...@gmail.com wrote: On Mar 14, 2012, at 12:03 PM, Jamie Johnson wrote: Is there a way to force

Re: SolrCloud Force replication without restarting

2012-03-14 Thread Mark Miller
Yeah, that looks right to me. On Mar 14, 2012, at 12:54 PM, Jamie Johnson wrote: Great, so to be clear I would execute the following correct? http://localhost:8983/solr/admin/cores?action=REQUESTRECOVERYcore=slice1_shard2 On Wed, Mar 14, 2012 at 12:18 PM, Mark Miller markrmil...@gmail.com

Re: Using two repeater to rapidly switching Master and Slave (Replication)?

2012-03-14 Thread stockii
Did your configuration works ? i have the same issue and i dont know if it works... i have 2 servers. each with 2 solr instances (one for updates other for searching) now i need replication from solr1 to solr2. but what the hell do solr if master crashed ??? -

Solr Memory Usage

2012-03-14 Thread Mike Austin
I'm looking at the solr admin interface site. On the dashboard right panel, I see three sections with size numbers like 227MB(light), 124MB(darker), and 14MB(darkest). I'm on a windows server. Couple questions about what I see in the solr app admin interface: - In the top right section of the

Solr core swap after rebuild in HA-setup / High-traffic

2012-03-14 Thread KeesSchepers
Hello everybody, I am designing a new Solr architecture for one of my clients. This sorl architecture is for a high-traffic website with million of visitors but I am facing some design problems were I hope you guys could help me out. In my situation there are 4 Solr servers running, 1 server is

RE: Solr core swap after rebuild in HA-setup / High-traffic

2012-03-14 Thread Young, Cody
We have a very similar system. In our case we have a row version field in our sql database. When we run the full import we keep track of the latest row version at the time that the full import started. Once the full import is done we run an optimize and then run a delta import (actually this is

problems with DisjunctionMaxQuery and early-termination

2012-03-14 Thread Carlos Gonzalez-Cadenas
Hello all, We have a SOLR index filled with user queries and we want to retrieve the ones that are more similar to a given query entered by an end-user. It is kind of a related queries system. The index is pretty big and we're using early-termination of queries (with the index sorted so that the

Re: index size with replication

2012-03-14 Thread Mike Austin
The odd thing is that if I optimize the index it doubles in size.. If I then, add one more document to the index it goes back down to half size? Is there a way to force this without needing to wait until another document is added? Or do you have more information on what you think is going on? I'm

Re: Solr core swap after rebuild in HA-setup / High-traffic

2012-03-14 Thread KeesSchepers
Well, the point is as follows. We have a mysql table where all the changes are tracked something very simular to your situation. The first problem is that, the delta-import on the live core needs to update this table to notify a record is done. I do this very awfull now within a script

Re: index size with replication

2012-03-14 Thread Ahmet Arslan
Another note.. if I reload solr app it goes back down in size. here is my replication settings on the master: requestHandler name=/replication class=solr.ReplicationHandler        lst name=master          str name=replicateAfterstartup/str          str name=replicateAftercommit/str

RE: index size with replication

2012-03-14 Thread Dyer, James
SOLR-3033 is related to ReplcationHandler's ability to do backups. It allows you to specify how many backups you want to keep. You don't seem to have any backups configured here so it is not an applicable parameter (note that SOLR-3033 was committed to trunk recently but the config param was

Re: index size with replication

2012-03-14 Thread Mike Austin
Thanks. I might just remove the optimize. I had it planned for once a week but maybe I'll just do it and restart the app if performance slows. On Wed, Mar 14, 2012 at 4:37 PM, Dyer, James james.d...@ingrambook.comwrote: SOLR-3033 is related to ReplcationHandler's ability to do backups. It

Re: index size with replication

2012-03-14 Thread Shawn Heisey
On 3/14/2012 2:54 PM, Mike Austin wrote: The odd thing is that if I optimize the index it doubles in size.. If I then, add one more document to the index it goes back down to half size? Is there a way to force this without needing to wait until another document is added? Or do you have more

Re: Solr core swap after rebuild in HA-setup / High-traffic

2012-03-14 Thread Shawn Heisey
On 3/14/2012 12:58 PM, KeesSchepers wrote: 1. I wipe the reindex core 2. I run the DIH to the complete dataset (4 million documents) in peices of 20.000 records (to prevent very long mysql locks) 3. After the DIH is finished (2 hours) we have to also have to update the rebuild core with changes

Responding to Requests with Chunks/Streaming

2012-03-14 Thread Nicholas Ball
Hello all, I've been working on a plugin with a custom component and a few handlers for a research project. It's aim is to do some interesting distributed work, however I seem to have come to a road block when trying to respond to a clients request in multiple steps. Not even sure if this is

Re: How to avoid the unexpected character error?

2012-03-14 Thread Li Li
no, it's nothing to do with schema.xml post.jar just post a file, it don't parse this file. solr will use xml parser to parse this file. if you don't escape special characters, it's not a valid xml file and solr will throw exceptions. On Thu, Mar 15, 2012 at 12:33 AM, neosky neosk...@yahoo.com

create on multicore

2012-03-14 Thread Warren H. Prince
Every night I dump my mySql db and load it into a development db. I have also configured solr as multicore with production and development as the cores. In order to keep my index on development current, I figured I could do a create to a new core, transition, every night, and then swap

Re: index size with replication

2012-03-14 Thread Mike Austin
Shawn, Thanks for the detailed answer! I will play around with this information in hand. Maybe a second optimize or just a dummy commit after the optimize will help get me past this. Both not the best options, but maybe it's a do it because it's running on windows work-around. If it is indeed a

Re: Sort by bayesian function for 5 star rating

2012-03-14 Thread Mike Austin
Why don't you just use that formula and calculate the weighted rating for each movie and index that value? sort=wrating desc Maybe I didn't understand your question. mike On Mon, Mar 12, 2012 at 1:38 PM, Zac Smith z...@trinkit.com wrote: Does anyone have an example formula that can be used to

Re: Solr out of memory exception

2012-03-14 Thread Li Li
how many memory are allocated to JVM? On Thu, Mar 15, 2012 at 1:27 PM, Husain, Yavar yhus...@firstam.com wrote: Solr is giving out of memory exception. Full Indexing was completed fine. Later while searching maybe when it tries to load the results in memory it starts giving this exception.

RE: Solr out of memory exception

2012-03-14 Thread Husain, Yavar
Thanks for helping me out. I have allocated Xms-2.0GB Xmx-2.0GB However i see Tomcat is still using pretty less memory and not 2.0G Total Memory on my Windows Machine = 4GB. With smaller index size it is working perfectly fine. I was thinking of increasing the system RAM tomcat heap space