Boosting documents with terms derived from clustering - good idea?

2013-05-14 Thread David Parks
We have a number of queries that produce good results based on the textual data, but are contextually wrong (for example, an SSD hard drive search matches the music album SSD hip hop drives us crazy. Textually a fair match, but SSD is a term that strongly relates to technical documents.

RE: More Like This and Caching

2013-05-09 Thread David Parks
I'm not the expert here, but perhaps what you're noticing is actually the OS's disk cache. The actual solr index isn't cached by solr, but as you read the blocks off disk the OS disk cache probably did cache those blocks for you. On the 2nd run the index blocks were read out of memory. There was

RE: Is the CoreAdmin RENAME method atomic?

2013-05-09 Thread David Parks
Find the discussion titled Indexing off the production servers just a week ago in this same forum, there is a significant discussion of this feature that you will probably want to review. -Original Message- From: Lan [mailto:dung@gmail.com] Sent: Friday, May 10, 2013 3:42 AM To:

RE: Solr Cloud with large synonyms.txt

2013-05-08 Thread David Parks
I can see your point, though I think edge cases would be one concern, if someone *can* create a very large synonyms file, someone *will* create that file. What would you set the zookeeper max data size to be? 50MB? 100MB? Someone is going to do something bad if there's nothing to tell them not

Indexing off of the production servers

2013-05-06 Thread David Parks
I've had trouble figuring out what options exist if I want to perform all indexing off of the production servers (I'd like to keep them only for user queries). We index data in batches roughly daily, ideally I'd index all solr cloud shards offline, then move the final index files to the solr

RE: Indexing off of the production servers

2013-05-06 Thread David Parks
performance will improve. 2013/5/6 David Parks davidpark...@yahoo.com I've had trouble figuring out what options exist if I want to perform all indexing off of the production servers (I'd like to keep them only for user queries). We index data in batches roughly daily

RE: Indexing off of the production servers

2013-05-06 Thread David Parks
So, am I following this correctly by saying that, this proposed solution would present us a way to index a collection on an offline/dev solr cloud instance and *move* that pre-prepared index to the production server using an alias/rename trick? That seems like a reasonably doable solution. I also

RE: Solr Cloud with large synonyms.txt

2013-05-06 Thread David Parks
Wouldn't it make more sense to only store a pointer to a synonyms file in zookeeper? Maybe just make the synonyms file accessible via http so other boxes can copy it if needed? Zookeeper was never meant for storing significant amounts of data. -Original Message- From: Jan Høydahl

RE: Bug? JSON output changes when switching to solr cloud

2013-04-22 Thread David Parks
Subject: Re: Bug? JSON output changes when switching to solr cloud Thanks David, I've confirmed this is still a problem in trunk and opened https://issues.apache.org/jira/browse/SOLR-4746 -Yonik http://lucidworks.com On Sun, Apr 21, 2013 at 11:16 PM, David Parks davidpark...@yahoo.com wrote: We

Bug? JSON output changes when switching to solr cloud

2013-04-21 Thread David Parks
We just took an installation of 4.1 which was working fine and changed it to run as solr cloud. We encountered the most incredibly bizarre apparent bug: In the JSON output, a colon ':' changed to a comma ',', which of course broke the JSON parser. I'm guessing I should file this as a bug, but it

RE: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread David Parks
of data. If you only actually query over, say, 500MB of the 120GB data in your dev environment, you would only use 500MB worth of RAM for caching. Not 120GB On Fri, Apr 19, 2013 at 7:55 AM, David Parks davidpark...@yahoo.com wrote: Wow! That was the most pointed, concise discussion of hardware

RE: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread David Parks
19, 2013 4:19 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud loadbalancing, replication, and failover On 4/19/2013 2:15 AM, David Parks wrote: Interesting. I'm trying to correlate this new understanding to what I see on my servers. I've got one server with 5GB dedicated to solr

RE: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread David Parks
Wow, thank you for those benchmarks Toke, that really gives me some firm footing to stand on in knowing what to expect and thinking out which path to venture down. It's tremendously appreciated! Dave -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent:

RE: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread David Parks
-user@lucene.apache.org Subject: Re: SolrCloud loadbalancing, replication, and failover On 4/19/2013 3:48 AM, David Parks wrote: The Physical Memory is 90% utilized (21.18GB of 23.54GB). Solr has dark grey allocation of 602MB, and light grey of an additional 108MB, for a JVM total of 710MB

SolrCloud loadbalancing, replication, and failover

2013-04-18 Thread David Parks
Step 1: distribute processing We have 2 servers in which we'll run 2 SolrCloud instances on. We'll define 2 shards so that both servers are busy for each request (improving response time of the request). Step 2: Failover We would now like to ensure that if either of the servers goes down

RE: SolrCloud loadbalancing, replication, and failover

2013-04-18 Thread David Parks
AM, David Parks davidpark...@yahoo.com wrote: Step 1: distribute processing We have 2 servers in which we'll run 2 SolrCloud instances on. We'll define 2 shards so that both servers are busy for each request (improving response time of the request). Step 2: Failover We would now like

RE: SolrCloud loadbalancing, replication, and failover

2013-04-18 Thread David Parks
regardless of how you lay out the cluster otherwise performance will suffer. My guess is if each Solr had sufficient resources, you wouldn't actually notice much difference in query performance. Tim On Thu, Apr 18, 2013 at 8:03 AM, David Parks davidpark...@yahoo.com wrote: But my concern

RE: SolrCloud loadbalancing, replication, and failover

2013-04-18 Thread David Parks
[mailto:s...@elyograg.org] Sent: Friday, April 19, 2013 11:51 AM To: solr-user@lucene.apache.org Subject: Re: SolrCloud loadbalancing, replication, and failover On 4/18/2013 8:12 PM, David Parks wrote: I think I still don't understand something here. My concern right now is that query times

RE: MoreLikeThis - Odd results - what am I doing wrong?

2013-04-02 Thread David Parks
Isn't this an AWS security groups question? You should probably post this question on the AWS forums, but for the moment, here's the basic reading material - go set up your EC2 security groups and lock down your systems.

RE: Slow queries for common terms

2013-03-25 Thread David Parks
at 3:10 AM, David Parks davidpark...@yahoo.com wrote: I see the CPU working very hard, and at the same time I see 2 MB/sec disk access for that 15 seconds. I am not running it this instant, but it seems to me that there was more CPU cycles available, so unless it's an issue of not being able

RE: Slow queries for common terms

2013-03-23 Thread David Parks
I see the CPU working very hard, and at the same time I see 2 MB/sec disk access for that 15 seconds. I am not running it this instant, but it seems to me that there was more CPU cycles available, so unless it's an issue of not being able to multithread it any further I'd say it's more IO

Slow queries for common terms

2013-03-21 Thread David Parks
I've got a query that takes 15 seconds to return whenever I have the term book in a query that isn't cached. That's a pretty common term in our search index. We're indexing about 120 GB of text data. We only store terms and IDs, no document data, and the disk is virtually unused, it's all CPU

RE: Slow queries for common terms

2013-03-21 Thread David Parks
this situation. But the pure fact that only a few common search words trigger such a delay would suggest commongrams as a possible way forward. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 21. mars 2013 kl. 11:09 skrev David Parks davidpark

Slow queries for common terms

2013-03-21 Thread David Parks
I've got a query that takes 15 seconds to return whenever I have the term book in a query that isn't cached. That's a pretty common term in our search index. We're indexing about 120 GB of text data. We only store terms and IDs, no document data, and the disk is virtually unused, it's all CPU

RE: Slow queries for common terms

2013-03-21 Thread David Parks
Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 21. mars 2013 kl. 12:43 skrev David Parks davidpark...@yahoo.com: We have 300M documents, each about a paragraph of text on average. The index is 140GB in size. I'm not sure how to find

Is Solr more CPU bound or IO bound?

2013-03-17 Thread David Parks
I'm spec'ing out some hardware for a first go at our production Solr instance, but I haven't spent enough time loadtesting it yet. What I want to ask if how IO intensive solr is vs. CPU intensive, typically. Specifically I'm considering whether to dual-purpose the Solr servers to run Solr

RE: Is Solr more CPU bound or IO bound?

2013-03-17 Thread David Parks
we'd be able to give you guidelines. Best, Manu On Mon, Mar 18, 2013 at 3:55 AM, David Parks davidpark...@yahoo.com wrote: I'm spec'ing out some hardware for a first go at our production Solr instance, but I haven't spent enough time loadtesting it yet. What I want to ask if how IO

RE: After upgrade to solr4, search doesn't work

2013-03-07 Thread David Parks
help on this, it certainly helped me get my configuration straight and the upgrade to 4 is now complete. All the best, David -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, March 06, 2013 7:56 PM To: solr-user@lucene.apache.org; David Parks Subject

After upgrade to solr4, search doesn't work

2013-03-05 Thread David Parks
I just upgraded from solr3 to solr4, and I wiped the previous work and reloaded 500,000 documents. I see in solr that I loaded the documents, and from the console, if I do a query *:* I see documents returned. I copied a single word from the text of the query results I got from *:* but any query

Re: After upgrade to solr4, search doesn't work

2013-03-05 Thread David Parks
Message- From: David Parks Sent: Wednesday, March 06, 2013 1:26 AM To: solr-user@lucene.apache.org Subject: After upgrade to solr4, search doesn't work I just upgraded from solr3 to solr4, and I wiped the previous work and reloaded 500,000 documents. I see in solr that I loaded the documents

Re: After upgrade to solr4, search doesn't work

2013-03-05 Thread David Parks
are Analysed and Indexed as per solr version 3.x On Wed, Mar 6, 2013 at 11:56 AM, David Parks davidpark...@yahoo.com wrote: I just upgraded from solr3 to solr4, and I wiped the previous work and reloaded 500,000 documents. I see in solr that I loaded the documents, and from the console, if I do

Re: After upgrade to solr4, search doesn't work

2013-03-05 Thread David Parks
=solr.KeywordMarkerFilterFactory protected=protwords.txt/filter class=solr.PorterStemFilterFactory//analyzer/fieldType From: David Parks davidpark...@yahoo.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Wednesday, March 6, 2013 1:58 PM

Re: After upgrade to solr4, search doesn't work

2013-03-05 Thread David Parks
the default value of the df parameter in the /select request handler in solrconfig.xml to be your default query field name if it is not text. -- Jack Krupansky -Original Message- From: David Parks Sent: Wednesday, March 06, 2013 1:26 AM To: solr-user@lucene.apache.org Subject: After

RE: Field Collapsing - Anything in the works for multi-valued fields?

2013-01-18 Thread David Parks
? See http://search-lucene.com/?q=solr+joinfc_type=wiki Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Jan 17, 2013 at 8:04 PM, David Parks davidpark...@yahoo.com wrote: The documents are individual products which come from 1 or more vendors. Example: a 'toy spiderman doll

Field Collapsing - Anything in the works for multi-valued fields?

2013-01-17 Thread David Parks
I want to configure Field Collapsing, but my target field is multi-valued (e.g. the field I want to group on has a variable # of entries per document, 1-N entries). I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that grouping doesn't support multi-valued fields yet. Anything in

RE: Field Collapsing - Anything in the works for multi-valued fields?

2013-01-17 Thread David Parks
-user Subject: Re: Field Collapsing - Anything in the works for multi-valued fields? David, What's the documents and the field? It can help to suggest workaround. On Thu, Jan 17, 2013 at 5:51 PM, David Parks davidpark...@yahoo.com wrote: I want to configure Field Collapsing, but my target field

Search strategy - improving search quality for short search terms such as doll

2013-01-16 Thread David Parks
I'm a beginner-intermediate solr admin, I've set up the basics for our application and it runs well. Now it's time for me to dig in and start tuning and improving queries. My next target is searches on simple terms such as doll which, in google, would return documents about, well, toy

RE: Search strategy - improving search quality for short search terms such as doll

2013-01-16 Thread David Parks
/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Jan 16, 2013 at 4:40 AM, David Parks davidpark...@yahoo.com wrote: I'm a beginner-intermediate solr admin, I've set up

RE: Search strategy - improving search quality for short search terms such as doll

2013-01-16 Thread David Parks
both queries have different context. Context based search at some level achieved by natural language processing. This one you can look at for better search. Look for solr wiki mailing list would be great source of learning. Rgds AJ On 16-Jan-2013, at 15:10, David Parks davidpark...@yahoo.com

RE: MoreLikeThis supporting multiple document IDs as input?

2013-01-04 Thread David Parks
for either approach. -- Jack Krupansky -Original Message- From: David Parks Sent: Thursday, January 03, 2013 4:11 AM To: solr-user@lucene.apache.org Subject: RE: MoreLikeThis supporting multiple document IDs as input? I'm not seeing the results I would expect. In the previous email below it's

RE: MoreLikeThis supporting multiple document IDs as input?

2013-01-03 Thread David Parks
that you are wondering WHY they are different? That latter question I don't have the answer to. -- Jack Krupansky -Original Message- From: David Parks Sent: Friday, December 28, 2012 2:48 AM To: solr-user@lucene.apache.org Subject: RE: MoreLikeThis supporting multiple document IDs as input? So

What do I need to research to solve the problem of returning good results for a generic term?

2012-12-28 Thread David Parks
I'm sure this is a complex problem requiring many iterations of work, so I'm just looking for pointers in the right direction of research here. I have a base term, such as let's say black dress that I might search for. Someone searching on this term is most logically looking for black dresses.

RE: solr + jetty deployment issue

2012-12-27 Thread David Parks
Do you see any errors coming in on the console, stderr? I start solr this way and redirect the stdout and stderr to log files, when I have a problem stderr generally has the answer: java \ -server \ -Djetty.port=8080 \ -Dsolr.solr.home=/opt/solr \

MoreLikeThis only returns 1 result

2012-12-27 Thread David Parks
I'm doing a query like this for MoreLikeThis, sending it a document ID. But the only result I ever get back is the document ID I sent it. The debug response is below. If I read it correctly, it's taking id:1004401713626 as the term (not the document ID) and only finding it once. But I want it to

RE: MoreLikeThis only returns 1 result

2012-12-27 Thread David Parks
Or, simply address the MLT handler directly: http://107.23.102.164:8080/solr/mlt?q=... Or, use the MoreLikeThis search component: http://localhost:8983/solr/select?q=...mlt=true;... See: http://wiki.apache.org/solr/MoreLikeThis -- Jack Krupansky -Original Message- From: David Parks

RE: MoreLikeThis supporting multiple document IDs as input?

2012-12-27 Thread David Parks
could POST that text back to the MLT handler and find similar documents using the posted text rather than a query. Kind of messy, but in theory that should work. -- Jack Krupansky -Original Message- From: David Parks Sent: Tuesday, December 25, 2012 5:04 AM To: solr-user@lucene.apache.org

RE: MoreLikeThis supporting multiple document IDs as input?

2012-12-27 Thread David Parks
will see how they are defined and used. HTH Otis Solr ElasticSearch Support http://sematext.com/ On Dec 28, 2012 12:06 AM, David Parks davidpark...@yahoo.com wrote: I'm somewhat new to Solr (it's running, I've been through the books, but I'm no master). What I hear you say is that MLT *can

RE: MoreLikeThis supporting multiple document IDs as input?

2012-12-26 Thread David Parks
could POST that text back to the MLT handler and find similar documents using the posted text rather than a query. Kind of messy, but in theory that should work. -- Jack Krupansky -Original Message- From: David Parks Sent: Tuesday, December 25, 2012 5:04 AM To: solr-user

MoreLikeThis supporting multiple document IDs as input?

2012-12-25 Thread David Parks
I'm unclear on this point from the documentation. Is it possible to give Solr X # of document IDs and tell it that I want documents similar to those X documents? Example: - The user is browsing 5 different articles - I send Solr the IDs of these 5 articles so I can present the user other