2012/12/12 David Smiley (@MITRE.org) dsmi...@mitre.org
britske wrote
Hi David,
Yeah interesting (as well as problematic as far is implementing) use-case
indeed :)
1. You mention there are no special caches / memory requirements
inherent
in this.. For a given user-query this would
In general you probably want to add a parameter distrib=true to your
search requests.
adm1n wrote:
I have 1 collection called index.
I created it like explained here: http://wiki.apache.org/solr/SolrCloud in
Example A: Simple two shard cluster section
here are the start up commands:
1)java
Hi,
Gian Maria Ricci - aka Alkampfer wrote:
Hi to everyone, I've a solr3.6 server up and running, now I wish to install
solr4 on the same machine and if possible in side-by-side configuration.
(tomcat on Windows) Is it possible and is there some documentation on how to do
this? Thanks a lot.
Hi Erick,
Thanks for replying. On the subject of commit vs optimize: for the
moment I'm actually replacing the entire index each time beginning
with a delete *:*, so I think doing an optimize is actually ok, as it
is essentially a new index anyway. Ultimately, I think I'll want to
be doing
Try using the URL-escaped % symbol instead.
Otis
--
SOLR Performance Monitoring - http://sematext.com/spm
On Dec 12, 2012 5:20 AM, Xi Shen davidshe...@gmail.com wrote:
Hi,
On http://localhost:8983/solr/#/collectioncn/analysis, if I input % in
the field and click the Analysis Values button, I
Hi,
We're starting to see issues on a test cluster where Solr breaks up query
string parameters that are either defined in the request handler or are passed
in the URL in the initial request.
In our request handler we have an SF parameter for edismax (SOLR-3925):
str name=sf
Hi all,
I wonder if this is a bug or expected behavior:
I have some documents indexed; 3 of them contain Thomas and 4 of them contain
Michael, but none of the contain both. A search for
http://localhost:8983/solr/collection1/browse?defType=edismaxq=(Thomas+Michael)
returns 0 results as expected
On 11/01/2012 05:06 AM, Jegannathan Mehalingam wrote:
Here is my code which uses CommonsHttpSolrServer:
String url = http://localhost:8983/solr/#/solr/update/;;
your solr url looks wrong, try this :
http://localhost:8983/solr/update/
or maybe this one is you have a core named solr :
JIRA ticket created: https://issues.apache.org/jira/browse/SOLR-4170
On 27 November 2012 23:41, Mark Miller markrmil...@gmail.com wrote:
Perhaps you can file a JIRA ticket with your findings?
- Mark
On Nov 27, 2012, at 5:31 PM, Marcin Rzewucki mrzewu...@gmail.com wrote:
Yes, I have and
You will need to create a context XML file for each Solr instance to
tell it where to find its indexes (aka solr_home). Obviously, each Solr
instance will want a different one.
See:
http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat
Note that it says to create a file
It doesnt sound exactly like a problem we experienced some time ago,
where long request where mixed put during transport. Jetty was to blame.
I might be Jetty that f up you request too? SOLR-4031. Are you still
running 8.1.2?
Regards, Per Steffensen
Markus Jelsma skrev:
Hi,
We're
Hi Per,
We're running Tomcat6 with the today's checkout from trunk. I cannot remember
i've seen it before and i cannot reproduce it manually in my browser, only in
concurrent stress tests firing queries.
Thanks
Markus
-Original message-
From:Per Steffensen st...@designware.dk
Ok, I managed to fix the universal charset error is caused by a missing
dependency
just download universalchardet-1.0.3.jar and put it in your extraction lib
the microsoft errors will probably be fixed in a future release of the POI
jars. (v3.9 didn't fix this error)
--
View this message in
When using SolrCloud, the dataimport.properties file goes to a different
location. See https://issues.apache.org/jira/browse/SOLR-3165 for more
information.
Also, while this feature works in 4.0.0, it is currently broken in
(not-released) 4.1 (branch_4x) and the development Trunk. This
Thank you for your response.
I saw the SOLR-3165, but can't really locate this different location. Even
when I searching for this file with find command.
According to the patch and warning message that I got (WARNING: Could not
read DIH properties from
On 12/12/2012 5:51 AM, Burgmans, Tom wrote:
I have some documents indexed; 3 of them contain “Thomas” and 4 of
them contain “Michael”, but none of the contain both. A search for
http://localhost:8983/solr/collection1/browse?defType=edismaxq=(Thomas+Michael)
You can only search against terms that are stored in your index. If you
have applied index time synonyms, you can't remove them at query time.
You can, however, use copyField to clone an incoming field to another
field that doesn't use synonyms, and search against that field instead.
Upayavira
I have set solrQueryParser defaultOperator=AND/ in the schema (and
restarted Solr), and tested again with
http://localhost:8983/solr/collection1/browse?defType=edismaxq=(Thomas+Michael)+OR+xxxmatchesnothingxxxq.op=AND
note the extra parameter. Still it returns the 7 documents that matches
Query-time analyzers are still applied, even if you include a string in quotes.
Would you expect foo to not match Foo just because it's enclosed in quotes?
Also look at this, someone who had similar requirements:
Query time synonyms have known problems. They are slower, cause incorrect IDF,
and don't work for phrase synonyms.
Apply synonyms at index time and you will have none of those problems.
See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
wunder
On Dec
On 12/12/2012 10:27 AM, Burgmans, Tom wrote:
I have set solrQueryParser defaultOperator=AND/ in the schema (and
restarted Solr), and tested again with
http://localhost:8983/solr/collection1/browse?defType=edismaxq=(Thomas+Michael)+OR+xxxmatchesnothingxxxq.op=AND
note the extra parameter.
@wunder
It is a misconception (well, supported by that wiki description) that the
query time synonym filter have these problems. It is actually the default
parser, that is causing these problems. Look at this if you still think
that index time synonyms are cure for all:
Query parsers cannot fix the IDF problem or make query-time synonyms faster.
Query synonym expansion makes more search terms. More search terms are more
work at query time.
The IDF problem is real; I've run up against it. The most rare variant of the
synonym have the highest score. This
Ok, that makes sense and it's probably workable, but, it's still more awkward
than having code and configuration deployed together to individual machines.
For example, for a deploy of new software/config we need to 1) first upload
config to zK. then 2) deploy new software to the nodes.
What
Hello everyone.
I have developed and stand alone WebApp with a custom API that dispatches
queries to SolrCloud using CloudSolrServer implementation to do that. I´m
testing with a single Zookeeper instance installed in an Amazon instance.
Solr servers are deployed in two Amazon instances and I
Yes /browse returns velocity stuff, but I mostly add wt=xml in the query. And
yes, I looked at the parsedquery feedback that debugQuery=true provides. That
basically confirms my idea that the implicit AND is indeed switched to an
implicit OR in case an explicit OR is somewhere else present in
I´ve read the following in SolrCloud FAQ:
*Q:* I'm seeing lot's of session timeout exceptions - what to do?
-
*A:* Try raising the ZooKeeper
http://wiki.apache.org/solr/ZooKeeper session
timeout by editing solr.xml - see the zkClientTimeout attribute. The
minimum session timeout is
On Dec 12, 2012, at 12:52 PM, Mike Schultz mike.schu...@gmail.com wrote:
Ok, that makes sense and it's probably workable, but, it's still more awkward
than having code and configuration deployed together to individual machines.
For example, for a deploy of new software/config we need to
britske wrote
Ah; ok. But still, my first suggestion is still what I think you could
do
except that the algorithm is simpler -- return the first matching 'y' in
the
document where the point matches the query. Alternatively, if you're
confident the number of matching documents (hotels) is
We still could replicate the issue in 4.1 branch i.e. queries going to one
server (numShards=1) is being distributed among all the servers which is
creating CPU spikes in all the servers in the cloud. Do you think this
behavior is as expected or will be fixed in the 4.1 release?
--
View this
But couldn't the IDF problem be fixed by applying the same IDF to all synonyms,
e.g. via DisjunctionMaxQuery? (Maybe the ideal would be an average, not a max.)
(E)dismax applies this query per-field, but AFAICT there is nothing stopping
anybody (modulo query parser construction :) ) from using
On Wed, Dec 12, 2012 at 5:03 PM, sausarkar sausar...@ebay.com wrote:
We still could replicate the issue in 4.1 branch i.e. queries going to one
server (numShards=1) is being distributed among all the servers which is
creating CPU spikes in all the servers in the cloud. Do you think this
This is somewhat confusing. You say that box2 is the slave, yet they're not
connected? Then you need to copy the solr home/data index from box 1 to
box 2 manually (I'd have box2 solr shut down at the time) and restart Solr.
Why can't the boxes be connected? That's a much simpler way of going
Hmm, I've gotten this very wrong :) - DisjunctionMaxQuery will operate per-doc,
so using it in the way I suggested will not allow for synonym IDF leveling
across documents. Also, scoring obviously includes more factors than IDF.
On Dec 12, 2012, at 5:18 PM, Steve Rowe sar...@gmail.com wrote:
On Wed, Dec 12, 2012 at 5:49 PM, Michael Ryan mr...@moreover.com wrote:
When sorting a TrieLongField, should there be any expected difference in
query speed when sorting ascending vs sorting descending? I'm seeing desc
queries sometimes take 10x longer than asc queries. I can provide more
Perhaps if there are a lot more ties on one end vs the other?
Or of the values being sorted on aren't that random? Do they naturally
increase like a timestamp?
It's a unique id field. The id is a simple sequential id, so docs with a lower
doc id will naturally also have a lower id.
I think
Sure, synonyms have lots of issues and choosing index vs. query is simply
picking your poison, but it all depends on your app and your data and your
user expectations, and you, the developer, have tools to moderate a lot of
these issues.
Index-time synonyms have the problem (among others)
If you have tons of content, you can do selective reindexing. You only need to
reindex the docs containing the the new terms. If I add a synonym for
babysitter and baby sitter, then I can do a search for documents containing
either of those, and only reindex those.
Reverse weighting to even
Another great use case for synonyms is misspellings. I saw one synonym list
in which the top synonym was the phrase dead mouse (which doesn't look
misspelled at all); I won't tell you what it's proper synonym was, other
than to say that it was VERY app/culture-dependent. It was also interesting
I prefer fuzzy search for misspellings. Solr does a very nice job with those,
weighting them by the similarity to the matched term.
wunder
On Dec 12, 2012, at 4:45 PM, Jack Krupansky wrote:
Another great use case for synonyms is misspellings. I saw one synonym list
in which the top synonym
Well, this IDF problem has more sides. So, let's say your synonym file
contains multi-token synonyms (it does, right? or perhaps you don't need
it? well, some people do)
TV, TV set, TV foo, television
if you use the default synonym expansion, when you index 'television'
you have increased
Yeah, semi-sort-of a known issue. Basically, the “mm” parameter is ignored
below the top-level of the query (i.e., within parentheses, except for some
special cases.)
Here’s another example where the implicit “AND” operator gets ignored:
All of the applications I've seen with user control over synonym expansion
where recall-oriented. The give me all matches for X kind of problem. So
ranking is not as important.
wunder
On Dec 12, 2012, at 5:23 PM, Roman Chyla wrote:
Well, this IDF problem has more sides. So, let's say your
Incidentally, you can also use debug=query to just get the query expansion
info without all the query explain and timing noise.
-- Jack Krupansky
-Original Message-
From: Shawn Heisey
Sent: Wednesday, December 12, 2012 12:43 PM
To: solr-user@lucene.apache.org
Subject: Re: edismax:
It no-ops for me. I entered abc%def, which shows this URL:
http://localhost:8983/solr/#/collection1/analysis?analysis.fieldvalue=abc%25defanalysis.query=analysis.fieldname=textverbose_output=1
But no analysis results whatsoever! No exception either.
This is in 4.0.
-- Jack Krupansky
Hi all,
Could anyone tell me if Solr support CGI
Thanks and regards,
Romita
:
http://localhost:8983/solr/#/collection1/analysis?analysis.fieldvalue=abc%25defanalysis.query=analysis.fieldname=textverbose_output=1
:
: But no analysis results whatsoever! No exception either.
It's a javascript error, so you probably won't see it w/o a browser add
on.
CGI as in Computer Gateway Interface? No.
Otis
--
SOLR Performance Monitoring - http://sematext.com/spm
On Dec 12, 2012 9:20 PM, Romita Saha romita.s...@sg.panasonic.com wrote:
Hi all,
Could anyone tell me if Solr support CGI
Thanks and regards,
Romita
Yes thats right. CGI as in Computer Gateway Interface. I understand that
you say Solr cannot support CGI. Am I correct? Could you kindly explain in
more details.
Thanks and regards,
Romita
From: Otis Gospodnetic otis.gospodne...@gmail.com
To: solr-user@lucene.apache.org,
Date:
I am a little confused about what exactly a CGI is. According to my
understanding, Common gateway interface tells the webserver how to pass
data back and forth to and from an application. user (client) requests for
a pagewebserver (CGI) server side program (may be Solr). Could
you
Solr runs in a servlet container like Jetty or Tomcat. CGIs are history.
Mentioning them makes me think of OB1 meeting Luke Skywalker... That's the
name I haven't heard in a long long time
Otis
--
SOLR Performance Monitoring - http://sematext.com/spm
On Dec 12, 2012 9:48 PM, Romita Saha
Yes thats right. CGI as in Computer Gateway Interface. I understand that
you say Solr cannot support CGI. Am I correct? Could you kindly explain in
more details.
Your question makes us all scratch our heads. Solr is a java servlet that
runs in a servlet container. CGI is something that you
Yes the information passing via url params is the same , but CGIs don't run
in servlet containers. Maybe you can share what you are really after.
Otis
--
SOLR Performance Monitoring - http://sematext.com/spm
On Dec 12, 2012 9:53 PM, Romita Saha romita.s...@sg.panasonic.com wrote:
I am a little
Yes thats right. CGI as in Computer Gateway Interface. I understand that
you say Solr cannot support CGI. Am I correct? Could you kindly explain
in
more details.
Your question makes us all scratch our heads. Solr is a java servlet that
runs in a servlet container. CGI is something that
: I am a little confused about what exactly a CGI is. According to my
: understanding, Common gateway interface tells the webserver how to pass
: data back and forth to and from an application. user (client) requests for
: a pagewebserver (CGI) server side program (may be Solr). Could
On 12/12/2012 8:00 PM, Shawn Heisey wrote:
Yes thats right. CGI as in Computer Gateway Interface. I understand that
you say Solr cannot support CGI. Am I correct? Could you kindly explain
in
more details.
Your question makes us all scratch our heads. Solr is a java servlet that
runs in a
I want to integrate Solr with a distributed server. I want to built an
interface in order to do this. This interface would basically pass the
data back and forth from solr to the server and vice versa. On receiving a
request from the server, this interface would add certain information to
the
Sounds like you want to put a proxy type server between clients and Solr
servers. That's doable. You can use any of the Solr client libraries to
talk to Solr from your proxy. We have done something very very similar
very recently in one of our internal projects at Sematext.
Otis
--
SOLR
I want to add the lucene dictionary of basic words, but with any database
that allows it to add it to lucene?
thanx..
Emily
--
View this message in context:
http://lucene.472066.n3.nabble.com/how-create-database-in-lucene-tp4026638.html
Sent from the Solr - User mailing list archive at
Hi Erick,
Sorry for creating the confusion. By slave, I mean the indexes on client
machine will be replica of the master and in not same as the slave in
master-slave model. Below is the detail:
The system is being developed to support search facility on 1000s of
system, a majority of which will
I want to know how score is calculated?
what is fieldweight, fieldNorm, queryWeight and queryNorm. And what is the
formula to get the final score using fieldweight, fieldNorm, queryWeight
,queryNorm, idf and tf.
Can anyone explain or provide some links?
Thanks,
Sangeetha
--
View this
In our case it's the opposite. For our clients it is very important that every
synonym gets equal chances in the relevancy calculation. The fact that nol
scores higher than net operating loss, simply because its document frequency
is lower, is unacceptable and a reason to look for ways to
I am also busy with getting this clear. Here are my notes so far (by copying
and writing myself):
queryWeight = the impact of the query against the field
implementation: boost(query)*idf*queryNorm
boost(query) = boost of the field at query-time
Implication: hits in
I cleaned up the solr schema by change a small portion of the stored fields
to stored=false.
out for 5000 document (about 500M total size of original documents), I ran a
benchmark comparing the solr index size between the schema before/after the
clean up.
first time run it showed about 40%
64 matches
Mail list logo