Sounds similar to https://issues.apache.org/jira/browse/SOLR-6165 which I
fixed in 4.10. Can you try a newer release?
On Wed, May 20, 2015 at 6:51 AM, Shawn Heisey apa...@elyograg.org wrote:
An unusual problem is happening with the DIH on a field that is an
unsigned BIGINT in the MySQL
Write a custom update processor and include it in your update chain.
You will then have the ability to do anything you want with the entire
input document before it hits the code to actually do the indexing.
This sounded like the perfect option ... until I read Jack's comment:
My
Personally, I see this as a limit of the dataimporthandler. It gets you
started, but when your needs get at all complicated, it can't help you.
I would encourage you to write your own indexing code. A little bit of
code that reads over your database, sorts it out in the right way, and
pushes it
Requesting Solr experts again to suggest some solutions to my above problem
as i am not able to solve this.
On Tue, May 12, 2015 at 11:04 AM, Naresh Yadav nyadav@gmail.com wrote:
Thanks Andrew, You got my problem precisely But solutions you suggested
may not work for me.
In my API i get
On 5/20/2015 12:06 AM, Shalin Shekhar Mangar wrote:
Sounds similar to https://issues.apache.org/jira/browse/SOLR-6165 which I
fixed in 4.10. Can you try a newer release?
I can't upgrade yet. I am using a plugin that hasn't been verified
against anything newer than 4.9. When a new version
On 19/05/15 14:47, Alessandro Benedetti wrote:
Hi Bram,
what do you mean with :
I
would like it to provide the unique value myself, without having the
deduplicator create a hash of field values .
This is not reduplication, but simple document filtering based on a
constraint.
In the
Hello,
might anyone suggest a field type with which I may do both a full text
search (i.e. there is an analyzer including a tokenizer) and apply a
collation?
An example for what I want to do:
There is a field composer for which I passed the value Dvořák, Antonín.
I want the following queries to
What the Solr de-duplciation offers you is to calculate for each document
in input an Hash ( based on a set of fields).
You can then select two options :
- Index everything, documents with same signature will be equals
- avoid the overwriting of duplicates.
How the similarity has is calculated
Thanks Jack.
In my case there is only one document - Foo Foo is in bar
As per your comment, I should expect TF to be 2.
But I am getting one.
Is there any check where if one match is a subset of other, is calculated
once?
My class extends DefaultSimilarity.
Cheers
Ariya Bala S
On Wed, May 20,
Hi,
I have made custom class for scoring the similarity
(TermFrequencyBiasedSimilarity).
The score was deduced by considering just the TF part (acheived by setting
IDF=1).
Question is:
-
*Document content:* Foo Foo is in bar
*Search query:* Foo bar
*slop:* 3
With Slop 3, There
Yes.
tf is both 1 and 2 - tf is per document, which is 1 for the first document
and 2 for the second document.
See:
http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
-- Jack Krupansky
On Wed, May 20, 2015 at 6:13 AM, ariya bala
Hi everyone,
I’ve been reading answers around this problem but I wanted to make sure that
there is another way out of my problem. The thing is that the solution
shouldn’t be on index-time, involve indexing a new field or changing this
multi-valued field to a single-valued one.
Problem:
I
Hi everyone,
My solution requires that users in group-A can only search against a set of
fields-A and users in group-B can only search against a set of fields-B,
etc. There can be several groups, as many as 100 even more. To meet this
need, I build my search by passing in the list of fields via
Hi,
I am having some problem whille grouping the result set.I have a solr schema
like this fields
field name=id type=string indexed=false stored=true required=true
/
field name=product type=string indexed=true stored=true
required=true /
field name=vendor type=string indexed=true
Please ignore.
On Wed, May 20, 2015 at 2:45 PM, ariya bala ariya...@gmail.com wrote:
Thanks Jack.
In my case there is only one document - Foo Foo is in bar
As per your comment, I should expect TF to be 2.
But I am getting one.
Is there any check where if one match is a subset of other, is
Use update processor to add number of tags per doc. eg check
CountFieldValuesUpdateProcessorFactory
Doc1 - tags:T1 T2 ; tagNum: 2
Doc2 - tags:T1 T3 ; tagNum: 2
Doc3 - tags:T1 T4 ; tagNum: 2
Doc4 - tags:T1 T2 T3 ; tagNum: 3
than when you search for tags you need to get number of tags matched
On Wed, May 20, 2015 at 12:59 PM, Bram Van Dam bram.van...@intix.eu wrote:
Write a custom update processor and include it in your update chain.
You will then have the ability to do anything you want with the entire
input document before it hits the code to actually do the indexing.
This
On Thu, May 14, 2015 at 12:01 AM, Tom Devel deve...@gmail.com wrote:
I tried to repost the whole modified document (the parent and ALL of its
children as one file), and it seems to work on a small toy example, but of
course I cannot be sure for a larger instance with thousands of documents,
Also, is this 1500 fields that are always populated, or are there really a
larger number of different record types, each with a relatively small
number of fields populated in a particular document?
Answer: This is a large number of different record types, each with a
relatively small number of
I was able to get what I wanted by processing the column in question as
massaged text, so that it was a comma-delimited series of IDs, and then
passing that to a subentity query that went something like: SELECT value
FROM othertable WHERE id IN (${master.ids}).
It's slow but I think it's getting
Hello fellow Solr users,
We're writing a book on applied Lucene search relevance -- Relevant
Search (http://manning.com/turnbull). We want to teach you to improve the
quality of your Solr search results! We're trying to bridge the academic
side of Information Retrieval from books like Intro. to
Thanks Shawn.
I have already switched to using POST because I need to send a long list of
data in qf. My question isn't about POST / GET, it's about Solr and
Lucene having to deal with such long list of fields. Here is the text of
my question reposted:
Given the above, beside the fact that a
Thank you all... You all are experts...
I will go with double as this seems to be more feasible.
Regards
On Tue, May 19, 2015 at 7:26 PM, Walter Underwood wun...@wunderwood.org
wrote:
A field type based on BigDecimal could be useful, but that would be a fair
amount more work.
Double is
Hello,
Here is the patch
https://issues.apache.org/jira/browse/SOLR-5882
On Tue, May 12, 2015 at 1:11 PM, StrW_dev r.j.bamb...@structweb.nl wrote:
Hi
Is it possible to configure the scoreMode of the Parent block join query
parser (ToParentBlockJoinQuery)?
It seems it's set to none, while
On 5/20/2015 9:24 AM, Steven White wrote:
I have already switched to using POST because I need to send a long list of
data in qf. My question isn't about POST / GET, it's about Solr and
Lucene having to deal with such long list of fields. Here is the text of
my question reposted:
Given
Shawn I agree with you, but, some of the decisions in the corporate world
are handed down through higher powers/pay grade, who do not always like to
hear counter arguments. For example, this is the same reason why
govt/federal restrict tech folks only use certified DBs/App Servers like
Oracle,WSAD
I highly recommend using boost= in edismax rather than bq=. The multiplicative
boost is stable with a wide range of scores. bq is additive and has problems
with high or low scores.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On May 20, 2015, at 1:04
could i do that the same way as my mention of using bq? the docs aren't
very rich in their example or explanation of boost= here:
https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
thanks!
--
*John Blythe*
Product Manager Lead Developer
251.605.3071 |
We've been using Solr a bit now for a year or so, 4.6 is the oldest version of
Solr we've deployed. We're currently working through the process we'll use to
upgrade to 5.1, an upgrade we need for the new facet.stats capabilities.
Reading the Major Changes document, it indicates that there is
Possibly you changed the field type sometime without completely
blowing away your index and re-indexing from scratch? Based on:
unexpected docvalues type SORTED_SET for field 'vendor' (expected=SORTED)
Because you can't group on multi-valued fields, which is I think
what's going on here.
Either
I believe that boost is a superset of the bq functionality.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On May 20, 2015, at 1:16 PM, John Blythe j...@curvolabs.com wrote:
could i do that the same way as my mention of using bq? the docs aren't
very
What is it? There isn't one except zkcli and variants ;).
Things are all automatic once you get things _to_ Zookeeper, but
pushing the config sets up is a manual process. The usual process is
to have the configs in some VCS somewhere so they're safe, and do the
usual checkout/edit/checkin and at
Hi Bjorn,
solr.ICUCollationField is useful for *sorting*, and you cannot sort on
tokenized fields.
Your example looks like diacritics insensitive search.
Please see : ASCIIFoldingFilterFactory
Ahmet
On Wednesday, May 20, 2015 2:53 PM, Björn Keil deeph...@web.de wrote:
Hello,
might anyone
On 5/20/2015 2:54 PM, John Blythe wrote:
new question re edismax: when i turn it on (in solr admin) my score goes
wayy down. from 772 to 4.9.
what in the edismax query parser would account for that huge nosedive?
Scores are 100% relative, and the number only has meaning in the context
of
thanks guys.
it doesn't depend on absolute scores, but it is leaning on the score as a
confident metric of sorts. we've found some good standard deviation info
when plotting out the accuracy of the top result and the relative score
with the analyzers currently in production and hope to strengthen
Hi all,
I've been fine tuning our current Solr implementation the last week or two
to get more precise results. We are trying to get our implementation
accurate enough to serve as a lightweight machine learning (obviously a
misnomer) implementation of sorts. Actual user generated searching is far
bq: Keep a copy of the value into a non-multi-valued field, using an
update processor: This involves indexing a new field
Why can't you do this? You can't re-index the data perhaps? It's by
far the easiest solution
Best,
Erick
On Wed, May 20, 2015 at 2:45 AM, Fernando Agüero
cool, will check into it some more via testing
--
*John Blythe*
Product Manager Lead Developer
251.605.3071 | j...@curvolabs.com
www.curvolabs.com
58 Adams Ave
Evansville, IN 47713
On Wed, May 20, 2015 at 3:22 PM, Walter Underwood wun...@wunderwood.org
wrote:
I believe that boost is a
I was going to post the same advice. If your approach depends on absolute
scores, you need to change your approach.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On May 20, 2015, at 2:09 PM, Shawn Heisey apa...@elyograg.org wrote:
On 5/20/2015 2:54
new question re edismax: when i turn it on (in solr admin) my score goes
wayy down. from 772 to 4.9.
what in the edismax query parser would account for that huge nosedive?
--
*John Blythe*
Product Manager Lead Developer
251.605.3071 | j...@curvolabs.com
www.curvolabs.com
58 Adams Ave
data scale and request rate can judge between block, plain joins and field
collapsing.
On Thu, Apr 30, 2015 at 1:07 PM, roySolr royrutten1...@gmail.com wrote:
Hello,
I have a situation and i'm a little bit stuck on the way how to fix it.
For example the following data structure:
*Deal*
Yeah a copyField into one could be a good space/time tradeoff. It can be
more manageable to use an all field for both relevancy and performance, if
you can handle the duplication of data.
You could set tie=1.0, which effectively sums all the matches instead of
picking the best match. You'll still
I'm absolutely sure that you need to group them externally in the indexer
eg like a child VALUES entity in DataImportHandler.
On Mon, May 11, 2015 at 9:52 PM, Vishal Swaroop vishal@gmail.com
wrote:
Need your valuable inputs...
I am indexing data from database (one table) which is in this
Seems like the attachements get stripped off. Anyways, here is the 4.7 log
on startup
INFO - 2015-05-20 10:35:45.786; org.eclipse.jetty.server.Server;
jetty-8.1.10.v20130312
INFO - 2015-05-20 10:35:45.804;
org.eclipse.jetty.deploy.providers.ScanningAppProvider; Deployment monitor
Hi,
I need a little clarification on configSets in solr 5.x.
According to this page:
https://cwiki.apache.org/confluence/display/solr/Config+Sets
I can create named configSets to be shared by other cores. If I create them
using this method AND am operating in SolrCloud mode, will it
Hi Doug,
Your blog write up on relevancy is very interesting, I didn't know this.
Looks like I have to go back to my drawing board and figure out an
alternative solution: somehow get those group-based-fields data into a
single field using copyField.
Thanks
Steve
On Wed, May 20, 2015 at 11:17
Thanks for calling out maxBooleanClauses. The current default of 1024 has
not caused me any issues (so far) in my testing.
However, you probably saw Doug Tumbull's reply, it looks like my relevance
will suffer.
Steve
On Wed, May 20, 2015 at 11:42 AM, Shawn Heisey apa...@elyograg.org wrote:
Shawn Heisey apa...@elyograg.org wrote:
I'm wondering ... if Jetty is good enough for the Google App Engine, why
isn't it good enough for your infrastructure standards?
Replace Jetty vs. Glassfish with Linux vs. Windows, Eclipse vs. Idea, emacs vs.
vi, Java vs. C#...
There are many reasons
John:
The spam filter is very aggressive. Try changing the type to plain
text rather than rich text or html...
Best,
Erick
On Wed, May 20, 2015 at 2:35 PM, John Blythe j...@curvolabs.com wrote:
thanks guys.
it doesn't depend on absolute scores, but it is leaning on the score as a
confident
Good call thank you
On Wed, May 20, 2015 at 5:15 PM, Erick Erickson erickerick...@gmail.com
wrote:
John:
The spam filter is very aggressive. Try changing the type to plain
text rather than rich text or html...
Best,
Erick
On Wed, May 20, 2015 at 2:35 PM, John Blythe j...@curvolabs.com
On 5/20/2015 3:35 PM, John Blythe wrote:
regarding the new question itself, i'd replied to this thread w more info
but had the system kick it back to me for some reason. maybe i replied too
much too soon? anyway, it ended up being a result of my query still being
in the primary query box
Well, let's see the code. Standard updates should replace the previous
docs, reindexing the same unique ID with fewer fields should show
fewer fields. So something's weird here.
Although do, just for yucks, issue a query on some of the unique ids
in question, I'd be curious if you get more than
I'm reindexing Mongo docs into SolrCloud. The new docs have had a few fields
removed so upon reindexing those fields should be gone in Solr. They are
not. So the result is a new doc merged with an old doc rather than a
replacement which is what I need.
I do not know whether the issue is with
The uniqueKey value is the same.
The new documents contain fewer fields than the already indexed ones. Could
this cause the updates to be treated as atomic? With the persisting fields
treated as un-updated?
Routing should be implicit since the collection was created using numShards.
Many
On 5/20/15, 8:21 AM, Shawn Heisey wrote:
As of right now, there is still a .war file. Look in the server/webapps
directory for the .war, server/lib/ext for logging jars, and server/resources
for the logging configuration. Consult your container's documentation to learn
where to place these
Never mind. I found that thread. Sorry for the noise.
On 5/20/15, 5:56 PM, TK Solr wrote:
On 5/20/15, 8:21 AM, Shawn Heisey wrote:
As of right now, there is still a .war file. Look in the server/webapps
directory for the .war, server/lib/ext for logging jars, and server/resources
for the
My SolrCloud cluster isn't reassigning the collections leaders from
downed cores--the downed cores are still listed as the leaders. The
cluster has been in the state for a few hours and the logs continue to
report No registered leader was found after waiting for 4000ms. Is
there a way to force
GC is operating the way I think it should but I am lacking memory. I am
just surprised because indexing is performing fine (documents going in) but
deletions are really bad (documents coming out).
Is it possible these deletes are hitting many segments, each of which I
assume must be re-built?
Yep. Solr/Lucene strives for one major revision backwards
compatibility. So any 5x should be able to read any index produced
with 4x, but no index produced with 3x.
Best,
Erick
On Wed, May 20, 2015 at 2:44 PM, Craig Longman clong...@iconect.com wrote:
We've been using Solr a bit now for a year
A few things:
Scores aren't confidence metrics, they are relative rankings, in
relation to a single resultset, that's all.
Secondly for edismax, boost does multiplicative boosting (whatever
function you provide, the score is multiplied by that), whereas bf does
additive boosting.
Upayavira
On
On 5/20/2015 4:43 PM, tuxedomoon wrote:
I'm reindexing Mongo docs into SolrCloud. The new docs have had a few fields
removed so upon reindexing those fields should be gone in Solr. They are
not. So the result is a new doc merged with an old doc rather than a
replacement which is what I
I have a collection with 1 billion documents and I want to delete 500 of
them. The collection has a dozen shards and a couple replicas. Using Solr
4.4.
Sent the delete query via HTTP:
http://hostname:8983/solr/my_collection/update?stream.body=
deletequerysource:foo/query/delete
Took a couple
On 5/20/2015 5:41 PM, Ryan Cutter wrote:
I have a collection with 1 billion documents and I want to delete 500 of
them. The collection has a dozen shards and a couple replicas. Using Solr
4.4.
Sent the delete query via HTTP:
http://hostname:8983/solr/my_collection/update?stream.body=
On 5/20/2015 5:57 PM, Ryan Cutter wrote:
GC is operating the way I think it should but I am lacking memory. I am
just surprised because indexing is performing fine (documents going in) but
deletions are really bad (documents coming out).
Is it possible these deletes are hitting many
Shawn, thank you very much for that explanation. It helps a lot.
Cheers, Ryan
On Wed, May 20, 2015 at 5:07 PM, Shawn Heisey apa...@elyograg.org wrote:
On 5/20/2015 5:57 PM, Ryan Cutter wrote:
GC is operating the way I think it should but I am lacking memory. I am
just surprised because
I have read that solr 5.x has moved away from deployable WAR architecture
to a runnable Java Application architecture. Our infrastructure/standards
folks are adamant about not running SOLR on Jetty (as we are about to
upgrade from 4.7.2 to 5.1), any ideas on how I can make it run on Glassfish
or
The uf parameter is used to specify which fields a user may query against
- the qf parameter specifies the set of fields that an unfielded query
term must be queried against. The user is free to specify fielded query
terms, like field1:term1 OR field2:term2. So, which use case are you
really
Steven,
I'd be concerned about your relevance with that many qf fields. Dismax
takes a winner takes all point of view to search. Field scores can vary
by an order of magnitude (or even two) despite the attempts of query
normalization. You can read more here
On 5/20/2015 9:07 AM, Ravi Solr wrote:
I have read that solr 5.x has moved away from deployable WAR architecture
to a runnable Java Application architecture. Our infrastructure/standards
folks are adamant about not running SOLR on Jetty (as we are about to
upgrade from 4.7.2 to 5.1), any ideas
Erick
Thanks for your response.
Logs don't seem to show any explicit errors (I have log level at INFO).
I am attaching the logs from a 4.7 start and a 5.1 start here. Note that
both logs seem to show the shards as Down initially but for 5.1, the
state change to Active later on.
Also, note that
On 5/20/2015 6:27 AM, Steven White wrote:
My solution requires that users in group-A can only search against a set of
fields-A and users in group-B can only search against a set of fields-B,
etc. There can be several groups, as many as 100 even more. To meet this
need, I build my search by
71 matches
Mail list logo