Re: Identify exact search in edismax

2012-10-05 Thread Mikhail Khludnev
absolutely, that's what I didn't get in your initial question. Okay it seems you are talking about typical eCommerce search problem. I will speak about it at http://www.apachecon.eu/schedule/presentation/18/ see you. On Fri, Oct 5, 2012 at 9:47 AM, rhl4tr rhl4...@gmail.com wrote: But user query

Re: PriorityQueue:initialize consistently showing up as hot spot while profiling

2012-10-05 Thread Mikhail Khludnev
what's the value of rows param http://wiki.apache.org/solr/CommonQueryParameters#rows ? On Fri, Oct 5, 2012 at 6:56 AM, Aaron Daubman daub...@gmail.com wrote: Greetings, I've been seeing this call chain come up fairly frequently when debugging longer-QTime queries under Solr 3.6.1 but have

Adding a new pseudo field

2012-10-05 Thread deniz
Hi all, I wanna have a field for each document which will simply store the doc's position ( rank, not its score ) for each query. so for each different query it will show the doc's new rank within the whole search result... I have been digging the source code ( 4.0 Beta ) but for now couldnt

Encryption of dataConfig section fields in SOLR

2012-10-05 Thread aniljayanti
Hi, Im generating SOLR from DB with below dataConfig section in data-config.xml file, and it's working fine. dataConfig dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver burl=jdbc:sqlserver://127.0.0.1;databaseName=emp user=user password=user*/

Re: PriorityQueue:initialize consistently showing up as hot spot while profiling

2012-10-05 Thread Aaron Daubman
On Fri, Oct 5, 2012 at 4:33 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: what's the value of rows param http://wiki.apache.org/solr/CommonQueryParameters#rows ? Very interesting question - so, for historic reasons lost to me, we pass in a huge (1000?) number for rows and this hits

Re: Identify exact search in edismax

2012-10-05 Thread Mikhail Khludnev
I have only pencil scratches yet, can't share it. I can say that i've found it quite close to approach described there http://www.ulakha.com/publications.html it's called there Concept Search, but as far as I understand I have rather different implementation approach. On Fri, Oct 5, 2012 at 2:31

Re: PriorityQueue:initialize consistently showing up as hot spot while profiling

2012-10-05 Thread Mikhail Khludnev
okay. huge rows value is no.1 way to kill Lucene. It's not possible, absolutely. You need to rethink logic of your component. Check Solr's FieldCollapsing code, IIRC it makes second search to achieve similar goal. Also check PostFilter and DelegatingCollector classes, their approach can also be

Re: Adding a new pseudo field

2012-10-05 Thread Jack Krupansky
I'm a little confused about what you actually expect to see. I mean, it sounds like all you are doing is numbering N query results as positions 1..N. But that's too obvious to be useful. Maybe you could provide an example. Or are you talking about query refinement, where you do one query and

Re: SolrCloud - replication factor

2012-10-05 Thread Erick Erickson
I _think_ I have this right... ReplicationFactor is the maximum number of extra replicas per shard. If you don't specify this, then as you bring up more and more nodes, the new nodes get assigned on a round-robin basis to shards. This allows you to have heterogeneous collections and not have

SolrCloud graph shard colors and meanings

2012-10-05 Thread Kristopher Kane
Can anyone point to a document that describes the meanings behind the different solrcloud graph shard colors? I've have several that are orange now with two as the active shard and our total index count is less than it was a day before. The logs aren't indicating anything in particular. Thanks,

Re: Get report of keywords searched.

2012-10-05 Thread Rajani Maski
Hi, Thank you for the reply Davide. Writing to db you mean to insert into db the search queries? I was thinking that this might effect search performance? Yes you are right, Getting stats for particular key word is tough. It would suffice if I can get q param and fq param values( when we

Re: Get report of keywords searched.

2012-10-05 Thread Davide Lorenzo Marino
If you think this could be a problem for your performances you can try two different solutions: 1 - Make the call to update the db in a different thread 2 - Make an asynchronous http call to a web application that update the db (in this case the web app can be resident in a different machine, so

Re: Proximity(tilde) combined with wildcard, AutomatonQuery ?

2012-10-05 Thread Ahmet Arslan
Hi Vadim, I attached a zip (solr plugin) file to SOLR-1604. This not a patch. This is supposed to work with solr 4.0. Some tests fails but it should work with pol* tel*~5 types of queries. Ahmet --- On Thu, 9/27/12, Vadim Kisselmann v.kisselm...@gmail.com wrote: From: Vadim Kisselmann

Re: SolrCloud graph shard colors and meanings

2012-10-05 Thread Stefan Matheis
Hey Kris Right now there is no specific Document .. but we could perhaps kind of a legend on this screen? .. in the meanwhile, does this help? http://svn.apache.org/viewvc/lucene/dev/trunk/solr/webapp/web/css/styles/cloud.css?view=markup#l259 The used css-classname is what we get from

SOLR 4.0 Beta documents being duplicated

2012-10-05 Thread David Quarterman
Hi, We've been using V4.x of SOLR since last November without too much trouble. Our MySQL database is refreshed daily and a full import is run automatically after the refresh and generally produces around 86,000 products, obviously on unique doc_id's. So, we upgraded to 4.0 Beta a few days

Re: Problem with relating values in two multi value fields

2012-10-05 Thread Torben Honigbaum
Hi Mikhail, I read the article and can't see how to solve my problem with FieldCollapsing. Any other suggestions? Torben Am 04.10.2012 um 17:31 schrieb Mikhail Khludnev: it's a typical nested document problem. there are several approaches. Out of the box solution as far you need facets is

Re: SolrCloud graph shard colors and meanings

2012-10-05 Thread Erick Erickson
A legend would be awesome, I'm vastly in favor of not having to go to external docs. Tooltip would work too. whichever is easier... Best Erick On Fri, Oct 5, 2012 at 10:24 AM, Stefan Matheis matheis.ste...@gmail.com wrote: Hey Kris Right now there is no specific Document .. but we could

Re: SOLR 4.0 Beta documents being duplicated

2012-10-05 Thread Erick Erickson
How are you indexing? There was a problem with indexing from SolrJ if you indexed documents in batches, server.add(doclist) that's fixed in 4.0 RC#. The work-around is to add docs singly, server.add(doc) Second thing. Bad Things Happen if you don't have a _version_ field in your schema.xml. Solr

Re: Problem with relating values in two multi value fields

2012-10-05 Thread Mikhail Khludnev
denormalize your docs to option x value tuples, identify them by duping id. doc str name=setid3/str str name=optionsA/str str name=value200/str /doc doc str name=setid3/str str name=optionsB/str str name=value400/str /doc doc str name=setid3/str str name=optionsB/str str

RE: SOLR 4.0 Beta documents being duplicated

2012-10-05 Thread David Quarterman
Thanks Erick. We've added the '_version_' and we'll see if that makes a difference tomorrow. Also, have downloaded the RC1 and will try that next week. Regards, David Q -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 05 October 2012 15:40 To:

Speed Up Query to Solr

2012-10-05 Thread Sushil jain
Hi, I am using EmbeddedSolrServer to access indexed data. I have to query server around 250K with different query each time. I have already created queries. But every time querying solr takes time. As I am querying using threads and loop, but still it's not so fast. Is there any way to speed

Re: Unknown format version: -11

2012-10-05 Thread Sushil jain
It's working fine on the server. Problem was at my local PC which might be occurred because of some misconfiguration. Thank you very much. On Fri, Oct 5, 2012 at 11:23 AM, Sushil jain jain.ayushm...@gmail.comwrote: I am using Solr 1.4.1 and same solr is indexing the documents, I have

Re: SolrCloud - replication factor

2012-10-05 Thread Tomás Fernández Löbbe
I think that's correct, but only when creating a new collection. I don't know if the replication factor is considered after that (running more nodes that have a core with the collection name, or manually adding nodes to the collection), or if some nodes go down. Also, please someone correct me if

Re: One index or multiple?

2012-10-05 Thread Erick Erickson
The very first question is what form are your XML docs in? Solr does NOT index arbitrary XML, so I'm guessing you're using DIH and some of the xml stuff there. Do note that the XSLT is a subset of the full capabilities Second, I'd recommend you just put it all in a single index, it'll be

Re: Speed Up Query to Solr

2012-10-05 Thread Erick Erickson
Here's a reference, much of it is at the Lucene layer, but it might be helpful. http://wiki.apache.org/lucene-java/ImproveSearchingSpeed If I'm reading this right, you want to get through 250K queries. What kind of throughput are you seeing? What is your target speed? I suspect you're going to

Re: segment number during optimize of index

2012-10-05 Thread jame vaalet
hi Shawn, thanks for the detailed explanation. I have got one doubt, you said it doesn matter how many segments index have but then why does solr has this merge policy which merges segments frequently? why can it leave the segments as it is rather than merging smaller one's into bigger one?

Re: SolrCloud graph shard colors and meanings

2012-10-05 Thread Kristopher Kane
Right now there is no specific Document .. but we could perhaps kind of a legend on this screen? .. in the meanwhile, does this help? http://svn.apache.org/viewvc/lucene/dev/trunk/solr/webapp/web/css/styles/cloud.css?view=markup#l259 The used css-classname is what we get from

RE: SolrJ - IOException

2012-10-05 Thread balaji.gandhi
Hi Toke, Were you able to find anything on this issue? We are running at 30 TPS and using the default HttpSolrServer for the posts. [cid:image001.png@01CDA2EA.370A6ED0] Thanks, Balaji Balaji Gandhi, Senior Software Developer, Horizontal Platform Services Product Engineering │ Apollo Group,

Re: segment number during optimize of index

2012-10-05 Thread Erick Erickson
because eventually you'd run out of file handles. Imagine a long-running server with 100,000 segments. Totally unmanageable. I think shawn was emphasizing that RAM requirements don't depend on the number of segments. There are other resources that file consume however. Best Erick On Fri, Oct 5,

Re: One index or multiple?

2012-10-05 Thread Billy Newman
Erick, I did mention using the DIH to index the first two datasets, that is where my the root of my problem lies. I do see the benefit of one index. However the question still remains, can I use the DIH to index xml from data set 1 and 2, every 15 minutes or so (full index) without wiping out

Re: Question about OR operator

2012-10-05 Thread Jorge Luis Betancourt Gonzalez
Thanks a lot for all the replies, Chris it worked out with this mm value: str name=mm 10% /str If this version of solr is affected with the bug you pointed out, shouldn't fail with this value as well? Greetings! On Oct 4, 2012, at 8:48 PM, Jorge Luis Betancourt Gonzalez wrote: Hi Chris:

Re: multivalued filed question (FieldCache error)

2012-10-05 Thread Chris Hostetter
: So extracting the attachment you will be able to track down what appens : : this is the query that shows the error, and below you can see the latest stack : trace and the qt definition Awesome -- exactly what we needed. I've reproduced your problem, and verified that it has something to do

Re: One index or multiple?

2012-10-05 Thread Erick Erickson
DIH always gives me indigestion. Couple of things: See the 'clean' parameter here for full import: http://wiki.apache.org/solr/DataImportHandler it defaults to true. I think if you set it to false _and_ assuming that your uniqueKey is defined, it should work OK. The other approach would be

Need to update a field without re-indexing in solr 3.6

2012-10-05 Thread Thakur, Pramila
Hi Everyone, I am using Solr 3.6. I want to update a single filed value in the index without re-indexing. Is this possible? I have google and came across partial update in solr 4.0 BETA. Can I do do this with Solr 3.6? Thanks, -- Pramila Thakur

Re: One index or multiple?

2012-10-05 Thread Walter Underwood
Using the same unique key doesn't handle documents which disappear from one indexing to the next. Instead, add a field for the type of item, like type:animal, type:vegetable, or type:mineral. Then the query used to clean up before indexing can delete all items of that type. wunder On Oct 5,

Re: Need to update a field without re-indexing in solr 3.6

2012-10-05 Thread Otis Gospodnetic
Hi, This is not doable in Solr 3.*. There are Lucene-level patches in JIRA, but I'm not sure if they are in Solr 4.* Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Fri, Oct 5, 2012 at 3:02 PM, Thakur,

Re: Getting some strange errors!!!

2012-10-05 Thread Otis Gospodnetic
Looks like HttpClient jar is not in your CLASSPATH or in -cp. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Fri, Oct 5, 2012 at 3:33 PM, Prithu Banerjee prid...@gmail.com wrote: I have been using solrJ

Re: Getting some strange errors!!!

2012-10-05 Thread Prithu Banerjee
Ok ok thanks a lot Otis. This was bothering me since a long while. Thanks a ton. On Sat, Oct 6, 2012 at 1:05 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Looks like HttpClient jar is not in your CLASSPATH or in -cp. Otis -- Search Analytics -

Re: Need to update a field without re-indexing in solr 3.6

2012-10-05 Thread Mikhail Khludnev
Could you please tell me more. What field do you need to update, how it influences the search results, how often, and why you can not afford commit? On Fri, Oct 5, 2012 at 11:14 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, This is not doable in Solr 3.*. There are Lucene-level

Re: Speed Up Query to Solr

2012-10-05 Thread Sushil jain
Thank you Erick for your quick response. Yes, you are right about my problem. and indexes are on same machine and yes I am using single machine I am using EmbeddedSolrServer class of SolrJ which removes HTTP layer. But still it takes time. On Fri, Oct 5, 2012 at 10:19 PM, Erick Erickson

Re: SolrJ - IOException

2012-10-05 Thread Sushil jain
Balaji, What is 30 TPS ? Toke, You should use EmbeddedSolrServer Instead. On Fri, Oct 5, 2012 at 11:42 PM, balaji.gandhi balaji.gan...@apollogrp.eduwrote: Hi Toke, Were you able to find anything on this issue? We are running at 30 TPS and using the default HttpSolrServer for the posts.

RE: SolrJ - IOException

2012-10-05 Thread balaji.gandhi
Sushil, 30 TPS = 30 transactions (updates) per second. Is the recommendation to use EmbeddedSolrServer instead of HttpSolrServer? Thanks, Balaji Balaji Gandhi, Senior Software Developer, Horizontal Platform Services Product Engineering │ Apollo Group, Inc. 1225 W. Washington St. | AZ23 |

Re: Proximity(tilde) combined with wildcard, AutomatonQuery ?

2012-10-05 Thread Vadim Kisselmann
Hi Ahmet, thank you, it sounds great:) I will test it in the next days and give feedback. Best regards Vadim 2012/10/5 Ahmet Arslan iori...@yahoo.com: Hi Vadim, I attached a zip (solr plugin) file to SOLR-1604. This not a patch. This is supposed to work with solr 4.0. Some tests fails but

Re: Adding a new pseudo field

2012-10-05 Thread Otis Gospodnetic
Hi, I think you should store this outside of Solr, in a DB or file or Redis (key is doc ID, value is a query=position map) or ... Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Fri, Oct 5, 2012 at 5:13

Re: SolrJ - IOException

2012-10-05 Thread Sushil jain
Yes, I'd recommend EmbeddedSolrServer, because it doesn't require any web server for read/write/update/delete operations. On Sat, Oct 6, 2012 at 1:48 AM, balaji.gandhi balaji.gan...@apollogrp.eduwrote: Sushil, 30 TPS = 30 transactions (updates) per second. Is the recommendation to use

Re: SolrCloud graph shard colors and meanings

2012-10-05 Thread Stefan Matheis
Already started .. if you want to follow and give feedback :) https://issues.apache.org/jira/browse/SOLR-3915 On Friday, October 5, 2012 at 7:53 PM, Kristopher Kane wrote: I also vote for a legend on the monitor.

Re: segment number during optimize of index

2012-10-05 Thread jame vaalet
Hi Eric, I am in a major dilemma with my index now. I have got 8 cores each around 300 GB in size and half of them are deleted documents in it and above that each has got around 100 segments as well. Do i issue a expungeDelete and allow the merge policy to take care of the segments or optimize

RE: SolrJ - IOException

2012-10-05 Thread balaji.gandhi
Sushil, we are trying to call the VIP in front of the SOLR nodes to distribute the update load. Also is EmbeddedSolrServer thread safe? Balaji Gandhi, Senior Software Developer, Horizontal Platform Services Product Engineering │ Apollo Group, Inc. 1225 W. Washington St. | AZ23 | Tempe, AZ

Re: SolrJ - IOException

2012-10-05 Thread Sushil jain
If you need to use solr in an embedded application, this is the recommended approach. It allows you to work with the same interface whether or not you have access to HTTP. And it is not thread safe. On Sat, Oct 6, 2012 at 1:58 AM, balaji.gandhi balaji.gan...@apollogrp.eduwrote: Sushil, we are

Re: Speed Up Query to Solr

2012-10-05 Thread Erick Erickson
But look what you're asking Solr to do. 250K queries. Let's say you get 100 QPS, which for a single box isn't bad. That's still 2,500 seconds, roughly 40 minutes. But you still haven't told us what QPS you're seeing. Or what you need to see. Or what kind of results you need from your queries.

Re: segment number during optimize of index

2012-10-05 Thread Erick Erickson
My first reaction is you have too much stuff on a single machine. Your cumulative index size is 2.4 TB. Granted, it's a beefy machine, but still... And index size isn't all the helpful, as it includes the raw stored data which doesn't really come into play for sizing things, subtract out the

Re: SolrJ - IOException

2012-10-05 Thread Erick Erickson
Well, using embedded Solr isn't necessarily indicated. I have a couple of questions. 1 you say 30 tps. Are you sending a single doc at a time or batching them up? I.e. server.add(doclist) or server.add(doc)? 2 Http isn't actually an inefficient protocol, I think the whole idea of using embedded

Re: One index or multiple?

2012-10-05 Thread Billy Newman
Does DIH support only deleting/re-indexing docs of a certain type? I.E. can I have a DIH for type:vegetable and another for type:mineral and each only deletes/recreates the right types? Thanks. On Fri, Oct 5, 2012 at 1:04 PM, Walter Underwood wun...@wunderwood.org wrote: Using the same unique

Re: SolrCloud - replication factor

2012-10-05 Thread Jan Høydahl
Mr. Miller said that it depends If you create your collection with the collections api, then replicationFactor will only see the currently live nodes, not nodes started later. However, collections added to solr.xml on all nodes, will participate in auto role assignment for new nodes started. I

Re: Problem with relating values in two multi value fields

2012-10-05 Thread Torben Honigbaum
Hi Mikhail, thank you for your answer. Maybe my sample data was a not so god. The document always have additional data which I need to use as facet like this: doc str name=id3/str str name=attribute_Avalue/str str name=attribute_Bvalue/str str name=options strA/str strB/str

queryResultWindowSize vs rows

2012-10-05 Thread Jie Sun
what will happen if in my query I specify a greater number for rows than the queryResultWindowSize in my solrconfig.xml for example, if queryResultWindowSize=100, but I need process a batch query from solr with rows=1000 each time and vary the start move on... what will happen? if I do not turn

Re: segment number during optimize of index

2012-10-05 Thread Otis Gospodnetic
If I were you and not knowing all your details... I would optimize indices that are static (not being modified) and would optimize down to 1 segment. I would do it when search traffic is low. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring -