Hi,
I like to use the HunspellStemFilterFactory to improve my search results.
Why isn't there an arg inject like in solr.PhoneticFilterFactory to
add tokens instead of replacing them?
I don't want to replace them, because documents with the unstemmed
word should be more relevant.
Thanks.
Hi Mark!
You could help yourself with creating an additional field. One field would
hold the stemmed version and the other one would hold the unstemmed
version.
This would allow for a higher boost on the unstemmed field.
Use copyField for convenience to copy the content from one field to the
Hi Alan,
Solr can do this fast and easy, but I wonder if a simple key-value-store
won't fit better for your suits.
Do you really only need to query be chart_id, or do you also need to
query by time range?
In either case, as long as your data fits into an in-memory database, I
would
Hello friends,
I have integrated solr in my alfresco document management server.
I want to search on metadata of documents stored in alfresco. and i also
want to display them in search result.
I have added meta tags as fields in schema.sql. But still i am unable to
search on metadata of the
Hi Marian,
thanks for your answer.
Using a copyField is a good idea.
Mark
2011/12/5 Marian Steinbach marian.steinb...@gmail.com:
Hi Mark!
You could help yourself with creating an additional field. One field would
hold the stemmed version and the other one would hold the unstemmed
version.
Hello,
I have one solr instance and i'm very happy with that. Now we have multiple
daily updates
and is see the response time is slower when doing a update. I think i need
some master slave replication. Now my question is: Is a slave slower when
there is an replication running from master to
Hi
I have been working with ElasticSearch for a while now, and find it very
cool. Unfortunately we are no longer allowed to use ElasticSearch in our
project. Therefore we are looking for alternatives - Solr(Cloud) is an
option.
I have been looking at SolrCloud and worked through the
Hi all,
i am looking for a solution where i want the facets to obtain based on the
paging of solr documents.
For ex:-
say i hv a query *:* and set start=0 and rows=10 and then i want facets on
any one of the fields in the 10 docs obtained and not on the entire docs for
which the query was
If I have a set list in solrconfig for my qf along with their
boosts, and I then specify field names directly in q (where I could
also override the boosts), are the boosts left in place, or reset to 1?
str name=qf
this^3
that^2
other^9
/str
ie q=field1:+(this that) +(other)
--
Hi
My guess is that the work for acheiving
http://wiki.apache.org/solr/NewSolrCloudDesign has begun on branch
solrcloud. It is hard to follow what is going on and how to use what
has been acheived - you cannot follow the examples on
http://wiki.apache.org/solr/SolrCloud anymore (e.g. there
Hi, AFAIK SolrCloud still doesn't support replication, that's why in the
example you have to copy the directory manually. Replication has to be
implemented by using the SolrReplication as you mentioned or use some kind
of distributed indexing (you'll have to do it yourself). SolrReplication
stuff
Tomás Fernández Löbbe skrev:
Hi, AFAIK SolrCloud still doesn't support replication, that's why in the
example you have to copy the directory manually. Replication has to be
implemented by using the SolrReplication as you mentioned or use some kind
of distributed indexing (you'll have to do it
Hi Robert, the answer depends on the query parser you are using. If you are
using the edismax query parser, then the qf will only be used when you
don't specify any field in the q parameter. In your example the result
query will be, boolean queries for this and that in the field1 and a
DisMax
You could try adding a
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory
Regards,
Tomás
On Mon, Dec 5, 2011 at 6:01 AM, Marian Steinbach marian.steinb...@gmail.com
wrote:
Hi!
I am surprised to find an empty string as the most frequent index term in
one
Thanks Tomás,
My example should have read...
q=+(field1:this field1:that) +(field2:other)
I'm using edismax.
so with this approach, the boosts as specified in solrconfig qf will
remain in place?
---
IntelCompute
Web Design Local Online Marketing
http://www.intelcompute.com
On Mon, 5 Dec
In this case, the boost and fields in the qf parameter won't be
considered for the search. With this query Solr will search for documents
with the terms this and/or (depending on your default operator) that in
the field1 and the term other in the field2
On Mon, Dec 5, 2011 at 9:44 AM, Robert
So I need to explicitly set the boosts in the query?
ie
q=+(field1:this^2 field1:that thing^4) +(field2:other^3)
---
IntelCompute
Web Design Local Online Marketing
http://www.intelcompute.com
On Mon, 5 Dec 2011 09:49:34 -0300, Tomás Fernández Löbbe
tomasflo...@gmail.com wrote:
In this
You can try bumping up the timeouts in your SolrJ program, the
SolrServer has a bunch of timeout options.
You can pretty easily tell if the optimize has carried through
anyway, your index files should have been reduced
substantially. But I'm pretty sure it's completing successfully.
Why call it
Have you looked at:
http://wiki.apache.org/solr/SolrCaching
?
But no, they aren't used for the same thing. The people
who work on the code work hard to keep the memory
use down.
Best
Erick
On Fri, Dec 2, 2011 at 4:37 AM, RT RT robt7...@yahoo.co.uk wrote:
Hi,
I'm trying to understand caching,
Hi
Reading http://wiki.apache.org/solr/SolrReplication I notice the
pollInterval (guess it should have been pullInterval) on the slaves.
That indicate to me that indexed information is not really pushed from
master to slave(s) on events defined by replicateAfter (e.g. commit),
but that it
Right, the Solr/Lucene query syntax isn't true Boolean logic, so
applying all the neat DeMorgan's rules is sometimes surprising.
The first form take all records with event dates or that fall outside your range
and inverts the results.
The second selects all documents that fall in the indicated
Well, Solr is a text search engine, and a good one. But this sure
feels like a problem that RDBMSs were built to handle. Why do
you want to do this? Is your current performance a problem?
Are you blowing your space resources out of the water? Do you
want to distribute your app to places not
There's no good way to say to Solr Use only this
much memory for searching. You can certainly
limit the size somewhat by configuring your caches
to be small. But if you're sorting, then Lucene will
use up some cache space etc.
Are you actually running into problems?
Best
Erick
On Fri, Dec 2,
Some details please. Are you indexing and searching
on the same machine? How are you committing?
After every add? Via commitWithin? Via autocommit?
What version of Solr? Whatenvironment?
You might review:
http://wiki.apache.org/solr/UsingMailingLists
Best
Erick
On Fri, Dec 2, 2011 at 2:35 PM,
Is there a formal manner to transfer the data to a database or file-format
from which it can be reloaded? I would say an export to a CSV file (which
could become huge) and the reload it from that?
Not quite sure what you mean by that. The data is not *in* the
solr index if it's not stored, so
Why not just use the first form of the document
and just facet.field=category? You'll get
two different facet counts for XX and YY
that way.
I don't think grouping is the way to go here.
Best
Erick
On Sat, Dec 3, 2011 at 6:43 AM, Juan Pablo Mora jua...@informa.es wrote:
I need to do some
Am 05.12.2011 14:28, schrieb Per Steffensen:
Hi
Reading http://wiki.apache.org/solr/SolrReplication I notice the
pollInterval (guess it should have been pullInterval) on the slaves.
That indicate to me that indexed information is not really pushed from
master to slave(s) on events defined by
Have you looked at the pf (phrase fields)
parameter of edismax?
http://wiki.apache.org/solr/DisMaxQParserPlugin#pf_.28Phrase_Fields.29
Best
Erick
On Sat, Dec 3, 2011 at 7:04 PM, alx...@aim.com wrote:
Hello,
Here is my request handler
requestHandler name=search class=solr.SearchHandler
Have you considered specifying a boost
function in your handler instead? See:
http://wiki.apache.org/solr/DisMaxQParserPlugin#bf_.28Boost_Functions.29
Best
Erick
On Sun, Dec 4, 2011 at 12:43 AM, Zac Smith z...@trinkit.com wrote:
Hi,
I think this is a pretty common requirement so hoping
Hi Kashif,
that is not possible in solr. The facets are always based on all the
documents matching the query.
But there is a workaround:
1) Do a normal query without facets (you only need to request doc ids
at this point)
2) Collect all the IDs of the documents returned
3) Do a second query for
Because I need the count and the result to return back to the client side. Both
the grouping and the facet offers me a solution to do that, but my doubt is
about performance ...
With Grouping my results are:
grouped:{
category:{
matches: ...,
groups:[{
Guess that is the whole point. Guess that I do not have to replicate
configuration files, since SolrCloud (AFAIK) does not use local
configuration files but information in ZK. And the it gets a little hard to
guess how to do it, since the explanation on http://wiki.apache.org/solr/*
1 Try adding debugQuery=on and see if the query parses the way
you expect.
2 Look at your admin/analysis page to see if your fields are getting
parsed the way you think.
3 Look in your admin/schema page to see if the actual terms are
what you expect...
Yeah, it's kind of daunting when
What does the version field need to look like? Something like?
field name=_version_ type=string indexed=true stored=true
required=true /
On Sun, Dec 4, 2011 at 2:00 PM, Yonik Seeley yo...@lucidimagination.com wrote:
On Fri, Dec 2, 2011 at 10:48 AM, Mark Miller markrmil...@gmail.com wrote:
Yes, and without doing much in the way of queries, either. Basically, our
test data has large numbers of distinct terms, each of which can be large
in themselves. Heap usage is a straight line -- up -- 75 percent of the
heap is consumed with byte[] allocations at the leaf of an object graph
On Mon, Dec 5, 2011 at 9:21 AM, Jamie Johnson jej2...@gmail.com wrote:
What does the version field need to look like?
It's in the example schema:
field name=_version_ type=long indexed=true stored=true/
-Yonik
http://www.lucidimagination.com
Thanks for answering
Mark Miller skrev:
Guess that is the whole point. Guess that I do not have to replicate
configuration files, since SolrCloud (AFAIK) does not use local
configuration files but information in ZK. And the it gets a little hard to
guess how to do it, since the explanation
Hi All,
If you've wanted a full time job working on Lucene or Solr, we have two
positions open that just might be of interest. The job descriptions are below.
Interested candidates should submit their resumes off list to
care...@lucidimagination.com.
You can learn more on our website:
Hello All,
I have my field description listed below, but I don't think its pertinent. As
my issue seems to be with the query parser.
I'm currently using an edismax subquery clause to help with my searching as
such:
_query_:{!type=edismax qf='ref_expertise'}\(nonlinear OR soliton\) AND
Hello,
When i add a synonym to synonyms.txt it works fine. For example:
foo = bar (when searching for foo, also bar gets found)
But this won't work (asume bar-bar is somewhere indexed) :
foo = bar-bar
what should i do to enable the searching of synonyms with dashes in them?
Thank you,
Zoran
That seems pretty straightforward. Thanks!
2011/12/5 Tomás Fernández Löbbe tomasflo...@gmail.com:
You could try adding a
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory
Regards,
Tomás
Am Montag, den 05.12.2011, 08:11 -0500 schrieb Erick Erickson:
You can try bumping up the timeouts in your SolrJ program, the
SolrServer has a bunch of timeout options.
You can pretty easily tell if the optimize has carried through
anyway, your index files should have been reduced
Hi,
My name is Ike Achebe and I am a Developer Analyst with the Johnson County
Library. I'm actually researching better and less expensive alternatives to
Google Appliance Search , which is currently our search engine.
Fortunately, I have come across a variety of blogs recommending
A colleague came to be with a problem that intrigued me. I can see
partly how to solve it with Solr, but looking for insight into solving
the last step.
The problem:
1) Start from a set of text transcriptions of videos where there is a
timestamp associated with each word.
2) Index into Solr
Assuming you are using Drupal for the website, you can have Solr set
up and integrated with Drupal in 5 minutes for local development
purposes.
See: https://drupal.org/node/1358710 for a pre-configured download.
-Peter
On Mon, Dec 5, 2011 at 11:46 AM, Achebe, Ike, JCL
ache...@jocolibrary.org
Thanks Yonik, must have just missed it.
A question about adding a new shard to the index. I am definitely not
a hashing expert, but the goal is to have a uniform distribution of
buckets based on what we're hashing. If that happens then our shards
would reach capacity at approximately the same
Currently, solr grouping (http://wiki.apache.org/solr/FieldCollapsing)
sorts groups by the score of the top document within each group. E.g.
[...]
groups:[{
groupValue:81cb63020d0339adb019a924b2a9e0c2,
doclist:{numFound:9,start:0,maxScore:4.729042,docs:[
{
On Mon, Dec 5, 2011 at 1:29 PM, Jamie Johnson jej2...@gmail.com wrote:
In this
situation I don't think splitting one shard would help us we'd need to
split every shard to reduce the load on the burdened systems right?
Sure... but if you can split one, you can split them all :-)
-Yonik
Yes completely agree, just wanted to make sure I wasn't missing the obvious :)
On Mon, Dec 5, 2011 at 1:39 PM, Yonik Seeley yo...@lucidimagination.com wrote:
On Mon, Dec 5, 2011 at 1:29 PM, Jamie Johnson jej2...@gmail.com wrote:
In this
situation I don't think splitting one shard would help us
On Mon, Dec 5, 2011 at 6:23 AM, Per Steffensen st...@designware.dk wrote:
Will it be possible to maintain a how-to-use section on
http://wiki.apache.org/solr/NewSolrCloudDesign with examples, e.g. like to
ones on http://wiki.apache.org/solr/SolrCloud,
Yep, it was on my near-term todo list to
*pk*: The primary key for the entity. It is*optional*and only needed
when using delta-imports. It has no relation to the uniqueKey defined in
schema.xml but they both can be the same.
When using in a nested entity is the PK the primary key column of the
join table or the key used for joining?
I know I'm using SolR for a task that is better suited for the DB to handle but
I'm
doing this for reasons related to the overall design of my system. My DB is
going to
become very large over time and it is constantly being updated via Hadoop jobs
that
collect,analyze some data and generate
I am crawling a bunch of HTML pages within a site (using ManifoldCF),
that will be sent to Solr for indexing. I want to extract some content
out of the pages, each piece of content to be stored as its own field
BEFORE indexing in Solr.
My guess would be that I should use a Document
Michael -
I was following your discussion on the MCF list too as well.
What kind of information do you want to extract from the HTML pages? The UIMA
thing would be fairly heavy weight. The simplest thing on the Solr-side of the
equation would be to write an UpdateProcessor(Factory) and
On 12/05/2011 01:52 PM, Michael Kelleher wrote:
I am crawling a bunch of HTML pages within a site (using ManifoldCF),
that will be sent to Solr for indexing. I want to extract some
content out of the pages, each piece of content to be stored as its
own field BEFORE indexing in Solr.
My
Hello Erik,
I will take a look at both:
org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor
and
org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessor
and figure out what I need to extend to handle processing in the way I
am looking for. I am
I'm not really sure how to title this but here's what I'm trying to do.
I have a query that creates a rather large dictionary of codes that are shared
across multiple fields of a base entity. I'm using the
cachedsqlentityprocessor but I was curious if there was a way to join this
multiple
Martijn,
I'm just seeing this reply today, please excuse the late reply.
I tried your suggestion, it I do get results back, but I get back a list of
Users when I instead am trying to get back a list of Posts.
Is it not possible to arbitrarily sort by either side of the join in solr?
Have
Is it possible to extract content for file types that Tika doesn’t support
without changing and rebuilding Tika? Do I need to specify a tika.config
file in the solrconfig.xml file, and if so, what is the format of that file?
One example that I’m trying to solve is for a document management
Hi Eric,
After reading more about pf param I increased them a few times and this solved
options 2, 3, 4 but 1. As an example, for phrase newspaper latimes
latimes.com is not even in the results to boost it to the first place and
changing mm param to str name=mm1lt;-1 5lt;-2 6lt;90%/str
On 12/4/2011 12:41 AM, Ted Dunning wrote:
Read the papers I referred to. They describe how to search fairly enormous
corpus with an 8GB in-memory index (and no disk cache at all).
They would seem to indicate moving away from Solr. While that would not
be entirely out of the question, I
: I am using solr 3.4 and configured my DataImportHandler to get some data from
: MySql as well as index some rich document from the disk.
...
: But after some time i get the following error in my error log. It looks like
: a class missing error, Can anyone tell me which poi jar version
: Have you looked at:
: http://wiki.apache.org/solr/SolrCaching
this page was actually a little light on details about fieldValueCache, so
i tried to fill in some of hte blanks in the latest version.
https://wiki.apache.org/solr/SolrCaching#fieldValueCache
-Hoss
Jeff,
I'm not entirely understanding everything you've been asking about (in
terms of what your ultimate goal is) but as far as the JoinQParser
specificially...
:
On Mon, Dec 5, 2011 at 3:28 PM, Shawn Heisey s...@elyograg.org wrote:
On 12/4/2011 12:41 AM, Ted Dunning wrote:
Read the papers I referred to. They describe how to search fairly
enormous
corpus with an 8GB in-memory index (and no disk cache at all).
They would seem to indicate moving
: Right, the Solr/Lucene query syntax isn't true Boolean logic, so
: applying all the neat DeMorgan's rules is sometimes surprising.
And more specificly, mixing boolean operators (AND/OR) with prefix
operators (+/-) is a recipe for disaster. In an expression like this..
XXX OR -YYY
Hi,
add features corresponding to stuff that we used to use in ElasticSearch
Does that mean you have used ElasticSearch but decided to try SolrCloud instead?
I'm also looking at a distributed solution. ElasticSearch just seems much
further along than SolrCloud. So I'd be interested to hear
: Then when I match a new Document with Red, Big, Document 1 should be top,
: Document 2 in the middle, and Document 3 in the bottom. But I still want
: Document 3 to show up in result because it still matches on Red.
:
: If I simply add opposite tags in the query with 1 boost (search for Red
:
This sorts of works. Although it feels kind of hack and I am not sure how
robust it is for more complicated situations. Is there any reason behind not
supporting a proper negative boost? Is there any mathematical restriction?
--
View this message in context:
pHola!brhope was fading fast finding this was such a relief its crazy how
the tables have turned I had to share this with someonebra
href=http://www.llantasgigantes.com.mx/profile/89SimonWalker/;http://www.llantasgigantes.com.mx/profile/89SimonWalker//abrsee
you/p
Shawn,
Question which is a bit off topic. You mention your algorithm for
sharding, how do you handle updates or do you not have to deal with
that in your scenario?
On Sat, Dec 3, 2011 at 1:54 PM, Shawn Heisey s...@elyograg.org wrote:
In another thread, something was said that sparked my
On Mon, Dec 5, 2011 at 6:23 AM, Per Steffensen st...@designware.dk wrote:
and add features
What's the list of features you are looking for?
--
- Mark
http://www.lucidimagination.com
On 12/5/2011 6:57 PM, Jamie Johnson wrote:
Question which is a bit off topic. You mention your algorithm for
sharding, how do you handle updates or do you not have to deal with
that in your scenario?
I have a long running program based on SolrJ that handles updates. Once
a minute, I run
Hi Chris:
Thanks a lot for your response. This is the kind of information I'm looking
for.
What you said about faceting is the key. I want to use my existing edismax
configuration to create the scored document result set of type Y. I don't want
to affect their scores, but for each document
It looks like https://issues.apache.org/jira/browse/SOLR-2382 or even
https://issues.apache.org/jira/browse/SOLR-2613.
I guess by using SOLR-2382 you can specify your own SortedMapBackedCache
subclass which is able to share your Dictionary.
Regards
On Tue, Dec 6, 2011 at 12:26 AM, Brent Mills
Hi
I am trying to upgrade my SOLR version from 1.4 to 3.2. but it's giving me
below exception. I have checked solr home path it is correct.. Please help
SEVERE: Could not start Solr. Check solr/home property
java.lang.NoSuchMethodError:
76 matches
Mail list logo