Hi,
I found solr-ruby gem (http://wiki.apache.org/solr/solr-ruby) really
inflexible in terms of specifying handler. The Solr::Request::Select class
defines handler as select and all other classes inherit from this class.
And since the methods in Solr::Connection use one of the classes from
Hi, i am using solr for my searches. in this i found a synonyms.text file in
which you can include synonyms manually for the words u want.
But as i suppose it would be very hard to include synonyms manually for each
word as my application has large data.
I want to know is there any way that this
On Tue, Jun 28, 2011 at 12:54 PM, Romi romijain3...@gmail.com wrote:
Hi, i am using solr for my searches. in this i found a synonyms.text file in
which you can include synonyms manually for the words u want.
Please see
Am 28.06.2011 09:24, schrieb Romi:
But as i suppose it would be very hard to include synonyms manually for each
word as my application has large data.
I want to know is there any way that this synonym.text file generate
automatically referring to all dictionary words
I don't get the point
You could add this filter after the NGram filter to prevent the phrase query
creation :
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory
Ludovic.
-
Jouve
France.
--
View this message in context:
I'm looking for a way to index/search on terms that may or may not contain
spaces.
An example will explain better :
- Loooking for healthcare, I want to find both healthcare and health
care.
- Loooking for health care, I want to find both health care and
healthcare.
My other constraints are
- I
Yonik Seeley-2-2 wrote:
On Sat, Jun 25, 2011 at 5:56 AM, marthinal
lt;jm.rodriguez.ve...@gmail.comgt; wrote:
sfield, pt and d can all be specified directly in the spatial
functions/filters too, and that will override the global params.
Unfortunately one must currently use lucene query
I will be out of the office starting 28/06/2011 and will not return until
30/06/2011.
Please email to itsta...@actionimages.com for any urgent issues.
Action Images is a division of Reuters Limited and your data will therefore be
protected
in accordance with the Reuters Group Privacy / Data
Hi,
I am not sure what is the index number value? It looks like an epoch time,
but in my case, this points to one month back. However, i can see documents
which were added last week, to be in the index.
Even after I did a commit, the index number did not change? Isn't it
supposed to change on
Please see
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
No offence, but a simple Google search, or a search of the Wiki
would have turned this up. Please try such simpler avenues before
dashing off a message to the list.
Gora, I heve already read the
I don't want to add all dictionary words to my synonyms.txt, but i wanted to
include synonyms for the words which i am having in my data...as you can
imagine if i have suppose 1000 words then i would be very tough to enter
synonyms for these 1000 words in synonyms.txt manually. I just want to know
Well you need to find word lists and/or a thesaurus.
This is one place to start:
http://wordlist.sourceforge.net/
I used the US/UK english word list for my synonyms for an index I have because
it contains both US and UK english terms, the list lacks some medical terms
though so we
I had the same problem:
http://lucene.472066.n3.nabble.com/Results-with-and-without-whitespace-soccer-club-and-soccerclub-td2934742.html#a2964942
--
View this message in context:
http://lucene.472066.n3.nabble.com/Find-results-with-or-without-whitespace-tp3117144p3117386.html
Sent from the
I also have the problem of duplicate docs.
I am indexing news articles, Every news article will have the source URL,
If two news-article has the same URL, only one need to index,
removal of duplicate at index time.
On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:
have you checked out
Create a hash from the url and use that as the unique key, md5 or sha1 would
probably be good enough.
Cheers
François
On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:
I also have the problem of duplicate docs.
I am indexing news articles, Every news article will have the source URL,
If
Will it be possible to do spatial searches on multi-valued spatial
fields soon?
I have a latlon field (point) that is multi-valued and don't know how to
search against it
such that the lats and lons match correctly - since they are split apart.
e.g. I have a document with 10 point/latlon
I am making the Hash from URL, but I can't use this as UniqueKey because I
am using UUID as UniqueKey,
Since I am using SOLR as index engine Only and using Riak(key-value
storage) as storage engine, I dont want to do the overwrite on duplicate.
I just need to discard the duplicates.
2011/6/28
On 06/27/2011 11:23 AM, lee carroll wrote:
Hi Tod,
A list of keywords would be fine in a non multi valued field:
keywords : xxx yyy sss aaa
multi value field would allow you to repeat the field when indexing
keywords: xxx
keywords: yyy
keywords: sss
etc
Thanks Lee. the problem is I'm
Thank you for your answer.
I agree, I can manage predictable values through synonyms.
However most data in this index are company and product names, leading
sometimes to rather strange syntax (mix of upper/lower case, misplaced dash
or spaces). One purpose to using solr was to help in finding
Maybe there is a way to get Solr to reject documents that already exist in the
index but I doubt it, maybe someone else with can chime here here. You could do
a search for each document prior to indexing it so see if it is already in the
index, that is probably non-optimal, maybe it is easiest
I found the deduplication thing really useful. Although I have not yet
started to work on it, as there are some other low hanging fruits I've to
capture. Will share my thoughts soon.
*Pranav Prakash*
temet nosce
Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com |
Google
(11/06/28 16:40), lboutros wrote:
You could add this filter after the NGram filter to prevent the phrase query
creation :
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory
Ludovic.
There is an option to avoid producing phrase queries,
Indeed, take a look at this:
http://wiki.apache.org/solr/Deduplication
I have not used it but it looks like it will do the trick.
François
On Jun 28, 2011, at 8:44 AM, Pranav Prakash wrote:
I found the deduplication thing really useful. Although I have not yet
started to
Hey François,
thanks for your suggestion, I followed the same link (
http://wiki.apache.org/solr/Deduplication)
they have the solution*, either make Hash as uniqueKey OR overwrite on
duplicate,
I dont need either.
I need Discard on Duplicate.
*
I have not used it but it looks like it will do
Thanks François Schiettecatte, information you provided is very helpful.
i need to know one more thing, i downloaded one of the given dictionary but
it contains many files, do i need to add all this files data in to
synonyms.text ??
-
Thanks Regards
Romi
--
View this message in context:
Mohammad,
just in case you meant it, I would like to discourage you to try to deduplicate
*the search result*.
There are many things that go wrong if you do that; we had it in one version of
the ActiveMath search environment (which uses Lucene):
- paging is inappropriate
- total count is wrong
Yeah, I read the overview which suggests that duplicates can be prevented from
entering the index and scanned the rest, it does not look like you can actually
drop the document entirely. Maybe I am missing something here.
François
On Jun 28, 2011, at 9:14 AM, Mohammad Shariq wrote:
Hey
Well no, you need to see which files (if any) will suit your needs, they are
not all synonyms files, I only needed the UK/US english file and I needed to
process it into a format suitable for the synonyms file.
There may well be other word lists on the net suitable for your needs. I would
not
It is precisely this limitation which triggered me to develop a grid indexing
approach using Geohashes: https://issues.apache.org/jira/browse/SOLR-2155
This patch requires a Solr trunk release.
If you have a small number of distinct points in total, and you only need
filtering, then the geohash
On Tue, Jun 28, 2011 at 4:18 PM, Pranav Prakash pra...@gmail.com wrote:
I am not sure what is the index number value? It looks like an epoch time,
but in my case, this points to one month back. However, i can see documents
which were added last week, to be in the index.
The index version
I am a user of Solr 3.2 and I make use of the distributed search capabilities
of Solr using a fairly simple architecture of a coordinator + some shards.
Correct me if I am wrong: In a standard distributed search with
QueryComponent, the first query sent to the shards asks for fl=myUniqueKey or
Hi all,
I'm having some weird behavior with my dataimport script. Because of memory
issues, I've taken to doing a delta import as doing a fullimport with
clean=false. My dataimport config file is set up like:
entity name=findDelta query=SELECT id FROM mytable WHERE date_added gt;
: I'm streaming over the document content (presumably via tika) and its
: gathering the document's metadata which includes the keywords metadata field.
: Since I'm also passing that field from the DB to the REST call as a list (as
: you suggested) there is a collision because the keywords field
Hi,
According to the doc:
http://wiki.apache.org/solr/LanguageAnalysis#Chinese.2C_Japanese.2C_Korean
solr.SmartChineseWordTokenFilterFactory is for Simplified Chinese.
Does it work for Traditional Chinese too? If not, is there anything equivalent
for Traditional Chinese?
Thanks.
Thanks guys. Both the PositionFilterFactory and the
autoGeneratePhraseQueries=false solutions solved the issue.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Analyzer-creates-PhraseQuery-tp3116288p3118471.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi,
I am facing multiple issues with solr and I am not sure what happens in each
case. I am quite naive in Solr and there are some scenarios I'd like to
discuss with you.
We have a huge volume of documents to be indexed. Somewhere about 5 million.
We have a full indexer script which essentially
You should modify the SolrCore for this, if I'm not mistaken.
Would extending LuceneQParserPlugin (solr 1.4) be an option for you?
On Tue, Jun 28, 2011 at 12:25 AM, Jamie Johnson jej2...@gmail.com wrote:
I have a need to take an incoming solr query and apply some additional
constraints to it
can you use facet search?
facet=truefacet.field=order_nofq=order_no:(1234 OR 5678 OR
...)fq=artist:Pink Floyd
On Mon, Jun 27, 2011 at 6:44 PM, Olson, Ron rol...@lbpc.com wrote:
Hi all-
I have a problem that I'm not sure how it can be (if it can be) solved in
Solr. I am using Solr 3.2 with
On 6/28/2011 1:38 PM, Pranav Prakash wrote:
- Will the commit by incremental indexer script also commit the
previously uncommitted changes made by full indexer script before it broke?
Yes, as long as the Solr instance hasn't crashed. Anything added but
not yet committed sticks around
hi
I'm looking at setting up multi core indices but also have an exiting
index. Can I run
this index along side new index set up as cores. On a dev machine
I've experimented with
simply adding solr.xml in slor home and listing the new cores in the
cores element but this breaks the existing
index.
Nope. But you can move your existing index into a core in a multi-core
setup. But a multi-core setup is a multi-core setup, there's no way to
have an index accessible at a non-core URL in a multi-core setup.
On 6/28/2011 2:53 PM, lee carroll wrote:
hi
I'm looking at setting up multi core
Hi All,
I was searching around for documentation of the performance differences of
having a sharded, single schema, dynamic field set up vs. a multi-core,
static multi-schema setup (which I currently have), but I have not had much
luck finding what I am looking for. I understand commits and
Hello everyone,
I believe I am missing something very elementary. The following query
returns zero hits:
http://localhost:8983/solr/core0/select/?q=testabc
However, using solritas, it finds many results:
http://localhost:8983/solr/core0/itas?q=testabc
Do you have any idea what the issue may
Quick question,
Is there a way with solr to conditionally update document on unique
id? Meaning, default, add behavior if id is not already in index and
*not to touch index if already there.
Deletes are not important (no sync issues).
I am asking because I noticed with deduplication turned on,
Quick question,
Is there a way with solr to conditionally update document on unique
id? Meaning, default, add behavior if id is not already in index and
*not to touch index if already there.
Deletes are not important (no sync issues).
I am asking because I noticed with deduplication turned on,
Hi Walter, probably solritas is using Dismax with a set of fields on the
qf parameter, while with your first query, you are just querying to the
default field.
On Tue, Jun 28, 2011 at 5:07 PM, Walter Closenfleight
walter.p.closenflei...@gmail.com wrote:
Hello everyone,
I believe I am
We are trying to get edismax to handle collocations mapped to a single
token. To do so we need to manipulate the chunks (as Hoss referred to
them in http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/)
generated by the dismax parser. We have numerous collocations (terms of
speech
But a multi-core setup is a multi-core setup, there's no way to have an
index accessible at a non-core URL in a multi-core setup.
Isn't there? What about defaultCoreName parameter? from the wiki: The
name of a core that will be used for requests that don't specify a core. If
you have one core and
I am trying to create a feature that allows search results to be displayed by
this formula sum(weight1*text relevance score, weight2 * price). weight1 and
weight2 are numeric values that can be changed to influence the search
results.
I am sending the following query params to the Solr instance
According to the docs on lucene query syntax:
Starting with Lucene 1.9 an additional (optional) parameter can specify the
required similarity. The value is between 0 and 1, with a value closer to 1
only terms with a higher similarity will be matched.
I was messing around with this and started
Using RAMDirectory really does not help performance. Java garbage
collection has to work around all of the memory taken by the segments.
It works out that Solr works better (for most indexes) without using
the RAMDirectory.
On Sun, Jun 26, 2011 at 2:07 PM, nipunb ni...@walmartlabs.com wrote:
51 matches
Mail list logo