Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

2010-08-04 Thread Ophir Adiv
[posted this yesterday in lucene-user mailing list, and got an advice to post this here instead. excuse me for spamming] Hi, I'm currently involved in a project of migrating from Lucene 2.9.1 to Solr 1.4.0. During stress testing, I encountered this performance problem: While actual search times

Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread Hando420
I would like to setup apache solr in eclipse using tomcat. It is easy to setup with jetty but with tomcat it doesn't run solr on runtime. Anyone has done this before? Hando -- View this message in context:

Re: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

2010-08-04 Thread Peter Karich
Ophir, this sounds a bit strange: CommonsHttpSolrServer.java, line 416 takes about 95% of the application's total search time Is this only for heavy load? Some other things: * with lucene you accessed the indices with MultiSearcher in a LAN, right? * did you look into the logs of the

RE: wildcard and proximity searches

2010-08-04 Thread Frederico Azeiteiro
Thanks for you ideia. At this point I'm logging each query time. My ideia is to divide my queries into normal queries and heavy queries. I have some heavy queries with 1 minute or 2mintes to get results. But they have for instance (*word1* AND *word2* AND word3*). I guess that this will be always

AW: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

2010-08-04 Thread Bastian Spitzer
Im not sure if i understand your problem, but basicly it isnt Solr vs Lucene but HttpURLConnection vs Solrj's CommonsHttpSolrServer, because Server Query Times havent changed at all from what u say? Why arent you querying the Server the same way you did before when u want to compare solr to

Re: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

2010-08-04 Thread Ophir Adiv
On Wed, Aug 4, 2010 at 10:50 AM, Peter Karich peat...@yahoo.de wrote: Ophir, this sounds a bit strange: CommonsHttpSolrServer.java, line 416 takes about 95% of the application's total search time Is this only for heavy load? I think this makes sense, since the hard work is done by Solr

Is there a better for solor server side loadbalance?

2010-08-04 Thread Chengyang
The default solr solution is client side loadbalance. Is there a solution provide the server side loadbalance?

Support loading queries from external files in QuerySenderListener

2010-08-04 Thread Stanislaw
Hi all! I cant load my custom queries from the external file, as written here: https://issues.apache.org/jira/browse/SOLR-784 This option is seems to be not implemented in current version 1.4.1 of Solr. It was deleted or it comes first with new version? regards, Stanislaw

Re: Date faceting

2010-08-04 Thread Koji Sekiguchi
(10/08/04 19:42), Eric Grobler wrote: Hi Solr community, How do I facet on timestamp for example? I tried something like this - but I get no result. facet=true facet.date=timestamp f.facet.timestamp.date.start=2010-01-01T00:00:00Z f.facet.timestamp.date.end=2010-12-31T00:00:00Z

Re: Best solution to avoiding multiple query requests

2010-08-04 Thread kenf_nc
Not sure the processing would be any faster than just querying again, but, in your original result set the first doc that has a field value that matches a to 10 facet, will be the number 1 item if you fq on that facet value. So you don't need to query it again. You would only need to query those

Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Geert-Jan Brits
Field Collapsing (currently as patch) is exactly what you're looking for imo. http://wiki.apache.org/solr/FieldCollapsing http://wiki.apache.org/solr/FieldCollapsingGeert-Jan 2010/8/4 Ken Krugler kkrugler_li...@transpac.com Hi all, I've got a situation where the key result from an initial

Re: Date faceting

2010-08-04 Thread Eric Grobler
Thanks Koji, It works :-) Have a nice day. regards ericz On Wed, Aug 4, 2010 at 12:08 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: (10/08/04 19:42), Eric Grobler wrote: Hi Solr community, How do I facet on timestamp for example? I tried something like this - but I get no result.

Re: Multi word synomyms

2010-08-04 Thread Qwerky
It would be nice if you could configure some kind of filter to be processed before the query string is passed to the parser. The QueryComponent class seems a nice place for this; a filter could be run against the raw query and ResponseBuilder's queryString value could be modified before the

Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread jayendra patil
Have got solr working in the Eclipse and deployed on Tomcat through eclipse plugin. The Crude approach, was to 1. Import the Solr war into Eclipse which will be imported as a web project and can be deployed on tomcat. 2. Add multiple source folders to the Project, linked to the checked

analysis tool vs. reality

2010-08-04 Thread Justin Lolofie
Erik: Yes, I did re-index if that means adding the document again. Here are the exact steps I took: 1. analysis.jsp ABC12 does NOT match title ABC12 (however, ABC or 12 does) 2. changed schema.xml WordDelimeterFilterFactory catenate-all 3. restarted tomcat 4. deleted the document with title ABC12

No group by? looking for an alternative.

2010-08-04 Thread Mickael Magniez
Hello, I'm dealing with a problem since few days : I want to index and search shoes, each shoe can have several size and colors, at different prices. So, what i want is : when I search for Converse, i want to retrieve one shoe per model, i-e one color and one size, but having colors and sizes

Re: analysis tool vs. reality

2010-08-04 Thread Robert Muir
I think I agree with Justin here, I think the way analysis tool highlights 'matches' is extremely misleading, especially considering it completely ignores queryparsing. it would be better if it put your text in a memoryindex and actually parsed the query w/ queryparser, ran it, and used the

analysis tool vs. reality

2010-08-04 Thread Justin Lolofie
Wow, I got to work this morning and my query results now include the 'ABC12' document. I'm not sure what that means. Either I made a mistake in the process I described in the last email (I dont think this is the case) or there is some kind of caching of query results going on that doesnt get

Re: Support loading queries from external files in QuerySenderListener

2010-08-04 Thread Shalin Shekhar Mangar
On Wed, Aug 4, 2010 at 3:27 PM, Stanislaw solrgeschic...@googlemail.comwrote: Hi all! I cant load my custom queries from the external file, as written here: https://issues.apache.org/jira/browse/SOLR-784 This option is seems to be not implemented in current version 1.4.1 of Solr. It was

Re: analysis tool vs. reality

2010-08-04 Thread Shalin Shekhar Mangar
On Wed, Aug 4, 2010 at 7:52 PM, Robert Muir rcm...@gmail.com wrote: I think I agree with Justin here, I think the way analysis tool highlights 'matches' is extremely misleading, especially considering it completely ignores queryparsing. it would be better if it put your text in a memoryindex

Re: Is there a better for solor server side loadbalance?

2010-08-04 Thread Shalin Shekhar Mangar
2010/8/4 Chengyang atreey...@163.com The default solr solution is client side loadbalance. Is there a solution provide the server side loadbalance? No. Most of us stick a HTTP load balancer in front of multiple Solr servers. -- Regards, Shalin Shekhar Mangar.

DIH and Cassandra

2010-08-04 Thread Mark
Is it possible to use DIH with Cassandra either out of the box or with something more custom? Thanks

Re: enhancing auto complete

2010-08-04 Thread Avlesh Singh
I preferred to answer this question privately earlier. But I have received innumerable requests to unveil the architecture. For the benefit of all, I am posting it here (after hiding as much info as I should, in my company's interest). The context: Auto-suggest feature on http://askme.in *Solr

Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread Hando420
Thanks man i haven't tried this but where do put that xml configuration. Is it to the web.xml in solr? Cheers, Hando -- View this message in context: http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023188.html Sent from the Solr - User mailing list

Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread jayendra patil
The sole home is configured in the web.xml of the application which points to the folder having the conf files and the data directory env-entry env-entry-namesolr/home/env-entry-name env-entry-valueD:/multicore/env-entry-value env-entry-typejava.lang.String/env-entry-type

can't use strdist as functionquery?

2010-08-04 Thread solr-user
I want to sort my results by how closely a given resultset field matches a given string. For example, say I am searching for a given product, and the product can be found in many cities including seattle. I want to sort the results so that results from city of seattle are at the top, and all

Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread Hando420
Thanks now its clear and works fine. Regards, Hando -- View this message in context: http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023404.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sharing index files between multiple JVMs and replication

2010-08-04 Thread Kelly Taylor
Is anybody else encountering these same issues; IF having a similar setup?  And is there a way to configure certain Solr web-apps as read-only (basically dummy instances) so that index changes are not allowed? - Original Message From: Kelly Taylor wired...@yahoo.com To:

Re: analysis tool vs. reality

2010-08-04 Thread Chris Hostetter
: I think I agree with Justin here, I think the way analysis tool highlights : 'matches' is extremely misleading, especially considering it completely : ignores queryparsing. it really only attempts to identify when there is overlap between analaysis at query time and at indexing time so you

Re: analysis tool vs. reality

2010-08-04 Thread Robert Muir
Furthermore, I would like to add its not just the highlight matches functionality that is horribly broken here, but the output of the analysis itself is misleading. lets say i take 'textTight' from the example, and add the following synonym: this is broken = broke the query time analysis is

Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Ken Krugler
Hi Geert-Jan, On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote: Field Collapsing (currently as patch) is exactly what you're looking for imo. http://wiki.apache.org/solr/FieldCollapsing Thanks for the ref, good stuff. I think it's close, but if I understand this correctly, then I could

Re: DIH and Cassandra

2010-08-04 Thread Andrei Savu
DIH only works with relational databases and XML files [1], you need to write custom code in order to index data from Cassandra. It should be pretty easy to map documents from Cassandra to Solr. There are a lot of client libraries available [2] for Cassandra. [1]

Re: DIH and Cassandra

2010-08-04 Thread Andrei Savu
DIH only works with relational databases and XML files [1], you need to write custom code in order to index data from Cassandra. It should be pretty easy to map documents from Cassandra to Solr. There are a lot of client libraries available [2] for Cassandra. [1]

Re: Is there a better for solor server side loadbalance?

2010-08-04 Thread Andrei Savu
Check this article [1] that explains how to setup haproxy to do load balacing. The steps are the same even if you are not using Drupal. By using this approach you can easily add more replicas without changing the application configuration files. You should also check SolrCloud [2] which does

Solrj ContentStreamUpdateRequest Slow

2010-08-04 Thread Tod
I'm running a slight variation of the example code referenced below and it takes a real long time to finally execute. In fact it hangs for a long time at solr.request(up) before finally executing. Is there anything I can look at or tweak to improve performance? I am also indexing a local

Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Geert-Jan Brits
If I understand correctly: you want to sort your collapsed results by 'nr of collapsed results'/ hits. It seems this can't be done out-of-the-box using this patch (I'm not entirely sure, at least it doesn't follow from the wiki-page. Perhaps best is to check the jira-issues to make sure this

Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Ken Krugler
Hi Geert-jan, On Aug 4, 2010, at 12:04pm, Geert-Jan Brits wrote: If I understand correctly: you want to sort your collapsed results by 'nr of collapsed results'/ hits. It seems this can't be done out-of-the-box using this patch (I'm not entirely sure, at least it doesn't follow from the

Indexing boolean value

2010-08-04 Thread PeterKerk
Im trying to index a boolean location, but for some reason it does not show up in my indexed data. data-config.xml entity name=location query=select * from locations field name=id column=ID / field name=title column=TITLE / field name=city column=CITY

RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Michael Griffiths
Your schema.xml setting for the field is probably tokenizing the punctuation. Change the field type to one that doesn't tokenize on punctuation; e.g. use text_ws and not text -Original Message- From: PeterKerk [mailto:vettepa...@hotmail.com] Sent: Wednesday, August 04, 2010 3:36 PM To:

RE: Indexing boolean value

2010-08-04 Thread Michael Griffiths
I could be wrong, but I thought bit was an integer. Try changing fieldtype to integer. -Original Message- From: PeterKerk [mailto:vettepa...@hotmail.com] Sent: Wednesday, August 04, 2010 3:42 PM To: solr-user@lucene.apache.org Subject: Indexing boolean value Im trying to index a

RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread PeterKerk
I changed values to text_ws Now I only seem to have problems with fieldvalues that hold spacessee below: field name=city type=text_ws indexed=true stored=true/ field name=theme type=text_ws indexed=true stored=true multiValued=true omitNorms=true termVectors=true / field

RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Markus Jelsma
You shouldn't fetch faceting results from analyzed fields, it will mess with your results. Search on analyzed fields but don't retrieve values from them.   -Original message- From: PeterKerk vettepa...@hotmail.com Sent: Wed 04-08-2010 22:15 To: solr-user@lucene.apache.org; Subject: RE:

RE: Indexing boolean value

2010-08-04 Thread PeterKerk
Hi, I tried that already, so that would make this: field name=official type=integer indexed=true stored=true/ copyField source=official dest=text / (still not sure what copyField does though) But even that wont work. I also dont see the officallocation columns indexed in the documents:

RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread PeterKerk
Sorry, but Im a newbie to Solr...how would I change my schema.xml to match your requirements? And what do you mean by it will mess with your results? What will happen then? -- View this message in context:

RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Michael Griffiths
Echoing Markus - use the tokenized field to return results, but have a duplicate field of fieldtype=string to show the untokenized results. E.g. facet on that field. -Original Message- From: Markus Jelsma [mailto:markus.jel...@buyways.nl] Sent: Wednesday, August 04, 2010 4:18 PM To:

RE: Indexing boolean value

2010-08-04 Thread Michael Griffiths
Copyfield copies the field so you can have multiple versions. Useful to dump all fields into one super field you can search on, for perf reasons. If the column isn't being indexed, I'd suggest the problem is in DIH. No suggestions as to why, I'm afraid. -Original Message- From:

RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Markus Jelsma
Hmm, you should first read a bit more on schema design on the wiki and learn about indexing and querying Solr.   The copyField directive is what is commonly used in a faceted navigation system, search on analyzed fields, show faceting results using the primitive string field type. With

Re: DIH and Cassandra

2010-08-04 Thread Shalin Shekhar Mangar
On Wed, Aug 4, 2010 at 9:11 PM, Mark static.void@gmail.com wrote: Is it possible to use DIH with Cassandra either out of the box or with something more custom? Thanks It will take some modifications but DIH is built to create denormalized documents so it is possible. Also see

RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread PeterKerk
Well the example you provided is 100% relevant to me :) I've read the wiki now (SchemaXml,SolrFacetingOverview,Query Syntax, SimpleFacetParameters), but still do not have an exact idea of what you mean. My situation: a city field is something that I want users to search on via text input, so

Re: DIH and Cassandra

2010-08-04 Thread Dennis Gearon
If data is stored in the index, isn't the index of Solr pretty much already a 'Big/Cassandra Table', except with tokenized columns to make seaching easier? How are Cassandra/Big/Couch DBs doing text/weighted searching? Seems a real duplication to use Cassandra AND Solr. OTOH, I don't know how

Re: Is there a better for solor server side loadbalance?

2010-08-04 Thread Peter Karich
The default solr solution is client side loadbalance. Is there a solution provide the server side loadbalance? No. Most of us stick a HTTP load balancer in front of multiple Solr servers. E.g. mod_jk is a very easy solution (maybe too simple/stupid?) for a load balancer, but it

Re: Some basic DataImportHandler questions

2010-08-04 Thread harrysmith
Thanks, I think part of my issue may be I am misunderstanding how to use the entity and field tags to import data in a particular format and am looking for a few more examples. Lets say I have a database table with 2 columns that contain metadata fields and values, and would like to import this

Re: No group by? looking for an alternative.

2010-08-04 Thread Lance Norskog
Hello- A way to do this is to create on faceting field that includes both the size and the color. I assume you have a different shoe product document for each model. Each model would include the color size 'red' and '14a' fields, but you would add a field with 'red-14a'. On Wed, Aug 4, 2010 at

Re: analysis tool vs. reality

2010-08-04 Thread Lance Norskog
there is some kind of caching of query results going on that doesnt get flushed on a restart of tomcat. Yes. Solr by default has http caching on if there is no configuration, and the example solrconfig.xml has it configured on. You should edit solrconfig.xml to use the alternative described in

Re: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Erick Erickson
I suspect you're running afoul of tokenizers and filters. The parts of your schema that you published aren't the ones that really count. What you probably need to look at is the FieldType definitions, i.e. what analysis is done for, say, text_ws (see FieldType... in your schema). There you might

XML Format

2010-08-04 Thread twojah
doc int name=AP_AUC_PHOTO_AVAIL1/int double name=AUC_AD_PRICE1.0/double int name=AUC_CLIENT_ID27017/int str name=AUC_DESCR_SHORTBracket Ceiling untuk semua merk projector, panjang 60-90 cm Bahan Besi Cat Hitam = 325rb Bahan Sta/str str

Re: Solrj ContentStreamUpdateRequest Slow

2010-08-04 Thread jayendra patil
ContentStreamUpdateRequest seems to read the file contents and transfer it over http, which slows down the indexing. Try Using StreamingUpdateSolrServer with stream.file param @ http://wiki.apache.org/solr/SolrPerformanceFactors#Embedded_vs_HTTP_Post e.g. SolrServer server = new

how to take a value from the query result

2010-08-04 Thread twojah
this is my query in browser navigation toolbar http://172.16.17.126:8983/search/select/?q=AUC_ID:607136 and this is the result in browser page: ... doc int name=AP_AUC_PHOTO_AVAIL1/int double name=AUC_AD_PRICE1.0/double int name=AUC_CAT576/int int name=AUC_CLIENT_ID27017/int str