We have a number of queries that produce good results based on the textual
data, but are contextually wrong (for example, an SSD hard drive search
matches the music album SSD hip hop drives us crazy.
Textually a fair match, but SSD is a term that strongly relates to technical
documents.
I'm not the expert here, but perhaps what you're noticing is actually the
OS's disk cache. The actual solr index isn't cached by solr, but as you read
the blocks off disk the OS disk cache probably did cache those blocks for
you. On the 2nd run the index blocks were read out of memory.
There was
Find the discussion titled Indexing off the production servers just a week
ago in this same forum, there is a significant discussion of this feature
that you will probably want to review.
-Original Message-
From: Lan [mailto:dung@gmail.com]
Sent: Friday, May 10, 2013 3:42 AM
To:
I can see your point, though I think edge cases would be one concern, if
someone *can* create a very large synonyms file, someone *will* create that
file. What would you set the zookeeper max data size to be? 50MB? 100MB?
Someone is going to do something bad if there's nothing to tell them not
I've had trouble figuring out what options exist if I want to perform all
indexing off of the production servers (I'd like to keep them only for user
queries).
We index data in batches roughly daily, ideally I'd index all solr cloud
shards offline, then move the final index files to the solr
performance will
improve.
2013/5/6 David Parks davidpark...@yahoo.com
I've had trouble figuring out what options exist if I want to
perform
all
indexing off of the production servers (I'd like to keep them only
for
user
queries).
We index data in batches roughly daily
So, am I following this correctly by saying that, this proposed solution
would present us a way to index a collection on an offline/dev solr cloud
instance and *move* that pre-prepared index to the production server using
an alias/rename trick?
That seems like a reasonably doable solution. I also
Wouldn't it make more sense to only store a pointer to a synonyms file in
zookeeper? Maybe just make the synonyms file accessible via http so other
boxes can copy it if needed? Zookeeper was never meant for storing
significant amounts of data.
-Original Message-
From: Jan Høydahl
Subject: Re: Bug? JSON output changes when switching to solr cloud
Thanks David,
I've confirmed this is still a problem in trunk and opened
https://issues.apache.org/jira/browse/SOLR-4746
-Yonik
http://lucidworks.com
On Sun, Apr 21, 2013 at 11:16 PM, David Parks davidpark...@yahoo.com
wrote:
We
We just took an installation of 4.1 which was working fine and changed it to
run as solr cloud. We encountered the most incredibly bizarre apparent bug:
In the JSON output, a colon ':' changed to a comma ',', which of course
broke the JSON parser. I'm guessing I should file this as a bug, but it
of data.
If you only actually query over, say, 500MB of the 120GB data in your dev
environment, you would only use 500MB worth of RAM for caching. Not 120GB
On Fri, Apr 19, 2013 at 7:55 AM, David Parks davidpark...@yahoo.com wrote:
Wow! That was the most pointed, concise discussion of hardware
19, 2013 4:19 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud loadbalancing, replication, and failover
On 4/19/2013 2:15 AM, David Parks wrote:
Interesting. I'm trying to correlate this new understanding to what I
see on my servers. I've got one server with 5GB dedicated to solr
Wow, thank you for those benchmarks Toke, that really gives me some firm
footing to stand on in knowing what to expect and thinking out which path to
venture down. It's tremendously appreciated!
Dave
-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
Sent:
-user@lucene.apache.org
Subject: Re: SolrCloud loadbalancing, replication, and failover
On 4/19/2013 3:48 AM, David Parks wrote:
The Physical Memory is 90% utilized (21.18GB of 23.54GB). Solr has
dark grey allocation of 602MB, and light grey of an additional 108MB,
for a JVM total of 710MB
Step 1: distribute processing
We have 2 servers in which we'll run 2 SolrCloud instances on.
We'll define 2 shards so that both servers are busy for each request
(improving response time of the request).
Step 2: Failover
We would now like to ensure that if either of the servers goes down
AM, David Parks davidpark...@yahoo.com wrote:
Step 1: distribute processing
We have 2 servers in which we'll run 2 SolrCloud instances on.
We'll define 2 shards so that both servers are busy for each request
(improving response time of the request).
Step 2: Failover
We would now like
regardless of how you lay out the cluster otherwise
performance will suffer. My guess is if each Solr had sufficient resources,
you wouldn't actually notice much difference in query performance.
Tim
On Thu, Apr 18, 2013 at 8:03 AM, David Parks davidpark...@yahoo.com wrote:
But my concern
[mailto:s...@elyograg.org]
Sent: Friday, April 19, 2013 11:51 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud loadbalancing, replication, and failover
On 4/18/2013 8:12 PM, David Parks wrote:
I think I still don't understand something here.
My concern right now is that query times
Isn't this an AWS security groups question? You should probably post this
question on the AWS forums, but for the moment, here's the basic reading
material - go set up your EC2 security groups and lock down your systems.
at 3:10 AM, David Parks davidpark...@yahoo.com wrote:
I see the CPU working very hard, and at the same time I see 2 MB/sec
disk access for that 15 seconds. I am not running it this instant, but
it seems to me that there was more CPU cycles available, so unless
it's an issue of not being able
I see the CPU working very hard, and at the same time I see 2 MB/sec disk
access for that 15 seconds. I am not running it this instant, but it seems
to me that there was more CPU cycles available, so unless it's an issue of
not being able to multithread it any further I'd say it's more IO
I've got a query that takes 15 seconds to return whenever I have the term
book in a query that isn't cached. That's a pretty common term in our
search index. We're indexing about 120 GB of text data. We only store terms
and IDs, no document data, and the disk is virtually unused, it's all CPU
this situation. But the pure fact that only a few
common search words trigger such a delay would suggest commongrams as a
possible way forward.
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com
21. mars 2013 kl. 11:09 skrev David Parks davidpark
I've got a query that takes 15 seconds to return whenever I have the term
book in a query that isn't cached. That's a pretty common term in our search
index. We're indexing about 120 GB of text data. We only store terms and IDs,
no document data, and the disk is virtually unused, it's all CPU
Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com
21. mars 2013 kl. 12:43 skrev David Parks davidpark...@yahoo.com:
We have 300M documents, each about a paragraph of text on average. The
index is 140GB in size. I'm not sure how to find
I'm spec'ing out some hardware for a first go at our production Solr
instance, but I haven't spent enough time loadtesting it yet.
What I want to ask if how IO intensive solr is vs. CPU intensive, typically.
Specifically I'm considering whether to dual-purpose the Solr servers to run
Solr
we'd be able to give you guidelines.
Best,
Manu
On Mon, Mar 18, 2013 at 3:55 AM, David Parks davidpark...@yahoo.com wrote:
I'm spec'ing out some hardware for a first go at our production Solr
instance, but I haven't spent enough time loadtesting it yet.
What I want to ask if how IO
help on this, it certainly helped me get my configuration straight and the
upgrade to 4 is now complete.
All the best,
David
-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Wednesday, March 06, 2013 7:56 PM
To: solr-user@lucene.apache.org; David Parks
Subject
I just upgraded from solr3 to solr4, and I wiped the previous work and
reloaded 500,000 documents.
I see in solr that I loaded the documents, and from the console, if I do a
query *:* I see documents returned.
I copied a single word from the text of the query results I got from *:*
but any query
Message- From: David Parks
Sent: Wednesday, March 06, 2013 1:26 AM
To: solr-user@lucene.apache.org
Subject: After upgrade to solr4, search doesn't work
I just upgraded from solr3 to solr4, and I wiped the previous work and
reloaded 500,000 documents.
I see in solr that I loaded the documents
are Analysed and Indexed as per solr version 3.x
On Wed, Mar 6, 2013 at 11:56 AM, David Parks davidpark...@yahoo.com wrote:
I just upgraded from solr3 to solr4, and I wiped the previous work and
reloaded 500,000 documents.
I see in solr that I loaded the documents, and from the console, if I do
=solr.KeywordMarkerFilterFactory protected=protwords.txt/filter
class=solr.PorterStemFilterFactory//analyzer/fieldType
From: David Parks davidpark...@yahoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Wednesday, March 6, 2013 1:58 PM
the default value of the df parameter in the
/select request handler in solrconfig.xml to be your default query field name
if it is not text.
-- Jack Krupansky
-Original Message- From: David Parks
Sent: Wednesday, March 06, 2013 1:26 AM
To: solr-user@lucene.apache.org
Subject: After
?
See http://search-lucene.com/?q=solr+joinfc_type=wiki
Otis
--
Solr ElasticSearch Support
http://sematext.com/
On Thu, Jan 17, 2013 at 8:04 PM, David Parks davidpark...@yahoo.com wrote:
The documents are individual products which come from 1 or more vendors.
Example: a 'toy spiderman doll
I want to configure Field Collapsing, but my target field is multi-valued
(e.g. the field I want to group on has a variable # of entries per document,
1-N entries).
I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that
grouping doesn't support multi-valued fields yet.
Anything in
-user
Subject: Re: Field Collapsing - Anything in the works for multi-valued
fields?
David,
What's the documents and the field? It can help to suggest workaround.
On Thu, Jan 17, 2013 at 5:51 PM, David Parks davidpark...@yahoo.com wrote:
I want to configure Field Collapsing, but my target field
I'm a beginner-intermediate solr admin, I've set up the basics for our
application and it runs well.
Now it's time for me to dig in and start tuning and improving queries.
My next target is searches on simple terms such as doll which, in google,
would return documents about, well, toy
/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at once.
Lately, it doesn't seem to be working. (Anonymous - via GTD book)
On Wed, Jan 16, 2013 at 4:40 AM, David Parks davidpark...@yahoo.com wrote:
I'm a beginner-intermediate solr admin, I've set up
both
queries have different context.
Context based search at some level achieved by natural language processing.
This one you can look at for better search.
Look for solr wiki mailing list would be great source of learning.
Rgds
AJ
On 16-Jan-2013, at 15:10, David Parks davidpark...@yahoo.com
for either approach.
-- Jack Krupansky
-Original Message-
From: David Parks
Sent: Thursday, January 03, 2013 4:11 AM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?
I'm not seeing the results I would expect. In the previous email below it's
that you are wondering
WHY they are different? That latter question I don't have the answer to.
-- Jack Krupansky
-Original Message-
From: David Parks
Sent: Friday, December 28, 2012 2:48 AM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?
So
I'm sure this is a complex problem requiring many iterations of work, so I'm
just looking for pointers in the right direction of research here.
I have a base term, such as let's say black dress that I might search for.
Someone searching on this term is most logically looking for black dresses.
Do you see any errors coming in on the console, stderr?
I start solr this way and redirect the stdout and stderr to log files, when
I have a problem stderr generally has the answer:
java \
-server \
-Djetty.port=8080 \
-Dsolr.solr.home=/opt/solr \
I'm doing a query like this for MoreLikeThis, sending it a document ID. But
the only result I ever get back is the document ID I sent it. The debug
response is below.
If I read it correctly, it's taking id:1004401713626 as the term (not the
document ID) and only finding it once. But I want it to
Or, simply address the MLT handler directly:
http://107.23.102.164:8080/solr/mlt?q=...
Or, use the MoreLikeThis search component:
http://localhost:8983/solr/select?q=...mlt=true;...
See:
http://wiki.apache.org/solr/MoreLikeThis
-- Jack Krupansky
-Original Message-
From: David Parks
could POST that text back to the MLT handler and find similar documents
using the posted text rather than a query. Kind of messy, but in theory that
should work.
-- Jack Krupansky
-Original Message-
From: David Parks
Sent: Tuesday, December 25, 2012 5:04 AM
To: solr-user@lucene.apache.org
will see how they
are defined and used.
HTH
Otis
Solr ElasticSearch Support
http://sematext.com/
On Dec 28, 2012 12:06 AM, David Parks davidpark...@yahoo.com wrote:
I'm somewhat new to Solr (it's running, I've been through the books,
but I'm no master). What I hear you say is that MLT *can
could POST that text back to the MLT handler and find
similar documents using the posted text rather than a query. Kind of
messy, but in theory that should work.
-- Jack Krupansky
-Original Message- From: David Parks
Sent: Tuesday, December 25, 2012 5:04 AM
To: solr-user
I'm unclear on this point from the documentation. Is it possible to give
Solr X # of document IDs and tell it that I want documents similar to those
X documents?
Example:
- The user is browsing 5 different articles
- I send Solr the IDs of these 5 articles so I can present the user other
49 matches
Mail list logo