Mike
I would be very interested in the answer to that question too. My hunch is that
the answer is no too. I have a few text databases that range from 200MB to
about 60GB with which I could run some tests. I will have some downtime in
early July and will post results.
From what I can tell the
Wayne
I am not sure what you mean by 'changing the record'.
One option would be to implement something like the synonyms filter to generate
the TC for SC when you index the document, which would index both the TC and
the SC in the same location. That way your users would be able to search with
Well you need to find word lists and/or a thesaurus.
This is one place to start:
http://wordlist.sourceforge.net/
I used the US/UK english word list for my synonyms for an index I have because
it contains both US and UK english terms, the list lacks some medical terms
though so we just
Create a hash from the url and use that as the unique key, md5 or sha1 would
probably be good enough.
Cheers
François
On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:
> I also have the problem of duplicate docs.
> I am indexing news articles, Every news article will have the source URL,
> I
gt; Since I am using SOLR as index engine Only and using Riak(key-value
> storage) as storage engine, I dont want to do the overwrite on duplicate.
> I just need to discard the duplicates.
>
>
>
> 2011/6/28 François Schiettecatte
>
>> Create a hash from the url an
le <http://www.google.com/profiles/pranny>
>
>
> 2011/6/28 François Schiettecatte
>
>> Maybe there is a way to get Solr to reject documents that already exist in
>> the index but I doubt it, maybe someone else with can chime here here. You
>> could do a search for
work on it, as there are some other low hanging fruits I've to
>>> capture. Will share my thoughts soon.
>>>
>>>
>>> *Pranav Prakash*
>>>
>>> "temet nosce"
>>>
>>> Twitter <http://twitter.com/pranavprakash>
wrote:
> Thanks François Schiettecatte, information you provided is very helpful.
> i need to know one more thing, i downloaded one of the given dictionary but
> it contains many files, do i need to add all this files data in to
> synonyms.text ??
>
> -
> Thanks & Regard
Indeed, I find the Porter stemmer to be too 'aggressive' for my taste, I prefer
the EnglishMinimalStemFilterFactory, with the caveat that it depends on your
data set.
Cheers
François
On Jun 29, 2011, at 6:21 AM, Ahmet Arslan wrote:
>> Hi, when i query for "elegant" in
>> solr i get results fo
I would run that word through the analyzer, I suspect that the word 'teste' is
being stemmed to 'test' in the index, at least that is the first place I would
check.
François
On Jun 30, 2011, at 2:21 PM, Celso Pinto wrote:
> Hi everyone,
>
> I'm having some trouble figuring out why a query wit
Celso Pinto wrote:
>> Hi François,
>>
>> it is indeed being stemmed, thanks a lot for the heads up. It appears
>> that stemming is also configured for the query so it should work just
>> the same, no?
>>
>> Thanks again.
>>
>> Regards,
&
Hi
I don't think that anyone has run such benchmarks, in fact this topic came up
two weeks ago and I volunteered some time to do that because I have some spare
time this week, so I am going to run some benchmarks this weekend and report
back.
The machine I have to do this a core i7 960, 24GB,
You just need to provide a second sort field along the lines of:
sort=score desc, author desc
François
On Jul 12, 2011, at 6:13 AM, Lox wrote:
> Hi,
>
> In the case where two or more documents are returned with the same score, is
> there a way to tell Solr to sort them alphabetically?
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html
http://wiki.apache.org/solr/SolrQuerySyntax
François
On Jul 13, 2011, at 1:29 PM, GAURAV PAREEK wrote:
> Hello,
>
> What are wildcards we can use with the SOLR ?
>
> Regards,
> Gaurav
Easy, the hyphen is out on its own (with spaces on either side) and is probably
getting removed from the search by the tokenizer. Check your analysis.
François
On Jul 14, 2011, at 6:05 AM, roySolr wrote:
> It looks like it's still not working.
>
> I send this to SOLR: q=arsenal \- london
>
>
I think anything but a 200 OK mean it is dead like the proverbial parrot :)
François
On Jul 19, 2011, at 7:42 AM, Romi wrote:
> But the problem is when solr server is not runing
> *"http://host:port/solr/admin/ping"*
>
> will not give me any json response
> then how will i get the status :(
>
You need to do something like this in the ./conf/tomcat server.xml file:
See 'URIEncoding' in http://tomcat.apache.org/tomcat-7.0-doc/config/http.html
Note that this will assume that the encoding of the data is in utf-8 if (and
ONLY if) the charset parameter is not set in the HTTP request
Check your analyzers to make sure that these characters are not getting
stripped out in the tokenization process, the url for 3.3 is somewhere along
the lines of:
http://localhost/solr/admin/analysis.jsp?highlight=on
And you should be indeed be searching on "\#test".
François
On Jul 2
Adding to my previous reply, I just did a quick check on the 'text_en' and
'text_en_splitting' field types and they both strip leading '#'.
Cheers
François
On Jul 22, 2011, at 10:49 AM, Shawn Heisey wrote:
> On 7/22/2011 8:34 AM, Jason Toy wrote:
>> How does one search for words with character
FWIW, here is the process I follow to create a log4j aware version of the
apache solr war file and the corresponding lo4j.properties files.
Have fun :)
François
##
#
# Log4J configuration for SOLR
#
# http://wiki.apache.org/solr/Sol
I get slf4j-log4j12-1.6.1.jar from
http://www.slf4j.org/dist/slf4j-1.6.1.tar.gz, it is what interfaces slf4j to
log4j, you will also need to add log4j-1.2.16.jar to WEB-INF/lib.
François
On Jul 26, 2011, at 3:40 PM, O. Klein wrote:
>
> François Schiettecatte wrote:
>>
>&
Note that the Qtime in the response packet is the search, exclusive of
> assembling the response so that's probably a good number to measure.
>
> Best
> Erick
>
> On Fri, Jul 8, 2011 at 8:01 AM, jame vaalet wrote:
>> i would prefer every setting to be in its defa
I have not seen this mentioned anywhere, but I found a useful 'trick' to
restart solr without having to restart tomcat. All you need to do is 'touch'
the solr.xml in the solr.home directory. It can take a few seconds but solr
will restart and reload any config.
Cheers
François
On Jul 27, 201
That seems a little far fetched, have you checked your analysis?
François
On Jul 31, 2011, at 4:58 PM, randohi wrote:
> One of our clients (a hot girl!) brought this to our attention:
> In this document there are many f* words:
>
> http://sec.gov/Archives/edgar/data/1474227/00014742271032/
Indeed, the analysis will show if the term is a stop word, the term gets
removed by the stop filter, turning on verbose output shows that.
François
On Jul 31, 2011, at 6:27 PM, Shashi Kant wrote:
> Check your Stop words list
> On Jul 31, 2011 6:25 PM, "François Schiettecatte"
Assuming you are running on Linux, you might want to check /var/log/messages
too (the location might vary), I think the kernel logs forced process
termination there. I recall that the kernel will usually picks the process
consuming the most memory, there may be other factors involved too.
Franç
Sounds to me that you are looking for HTTP Persistent Connections (connection
keep-alive as opposed to close), and a singleton object. This would be outside
SOLR per se.
A few caveats though, I am not sure if tomcat supports keep-alive, and I am not
sure how SOLR deals with multiple requests co
Merlin
Ü encodes to two characters in utf-8 (C39C), and one in iso-8859-1 (%DC) so it
looks like there is a charset mismatch somewhere.
Cheers
François
On Aug 27, 2011, at 6:34 AM, Merlin Morgenstern wrote:
> Hello,
>
> I am having problems with searches that are issued from spiders that
Giovanni
means the data is stored in the index and can be returned with
the search results (see the 'fl' parameter). This is independent of
Which means that you can store but not index a field:
Best regards
François
On Jun 30, 2012, at 9:57 AM, Giovanni Gherdovich wrote:
On Jul 11, 2012, at 2:52 PM, Shawn Heisey wrote:
> On 7/2/2012 2:33 AM, Nabeel Sulieman wrote:
>> Argh! (and hooray!)
>>
>> I started from scratch again, following the wiki instructions. I did only
>> one thing differently; put my data directory in /opt instead of /home/dev.
>> And now it works!
I would create two indices, one with your content and one with your ads. This
approach would allow you to precisely control how many ads you pull back and
how you merge them into the results, and you would be able to control schemas,
boosting, defaults fields, etc for each index independently.
You should check this at pcper.com:
http://pcper.com/ssd-decoder
http://pcper.com/content/SSD-Decoder-popup
Specs for a wide range of SSDs.
Best regards
François
On Aug 23, 2012, at 5:35 PM, Peyman Faratin wrote:
> Hi
>
> Is there a SSD brand and spec that the community re
What is probably going on is that the response is not being interpreted as
UTF-8 but as some other encoding.
What are you using to display the response?
François
On Aug 28, 2012, at 8:08 AM, zehoss wrote:
> Hi,
> at the beginning I would like to sorry for my english. I hope my message
> will
Aaron
The best way to make sure the index is cached by the OS is to just cat it on
startup:
cat `find /path/to/solr/index` > /dev/null
Just make sure your index is smaller than RAM otherwise data will be rotated
out.
Memory mapping is built on the virtual memory system, and I suspect
John
You can still use leading wildcards even if you dont have the
ReversedWildcardFilterFactory in your analysis but it means you will be
scanning the entire dictionary when the search is run which can be a
performance issue. If you do use ReversedWildcardFilterFactory you wont have
that perf
I suspect it is just part of the wildcard handling, maybe someone can chime in
here, you may need to catch this before it gets to SOLR.
François
On Nov 12, 2012, at 5:44 PM, johnmu...@aol.com wrote:
> Thanks for the quick response.
>
>
> So, I do not want to use ReversedWildcardFilterFactory,
101 - 136 of 136 matches
Mail list logo