A fuzzy query foo~ defaults to a similarity of 0.5, i.e. equal to foo~0.5
just as an FYI, this isn't true in trunk (4.0) any more.
the defaults are changed so that it never enumerates the entire
dictionary (slow) like before, see:
https://issues.apache.org/jira/browse/LUCENE-2667
so,
Hi,
I have a case where I use DisMax pf to boost on phrase match in a field. I
use omitNorms=true to avoid length normalization to mess with my scores.
However, for some documents, the phrase foo bar occur more than one time in
the same field, and I get an unintended TF boost for one of them
Hi Adam,
we are using DIH to index off an SQL Server database(the freeby SQLExpress
one.. ;) ). We have defined the following in our
%TOMCAT_HOME%\solr\conf\data-config.xml:
dataConfig
dataSource type=JdbcDataSource
name=mssqlDatasource
I finally figured out how to use curl to GET results, i.e. just turn all spaces
into '%20' in my type of queries. I'm using solar spatial, and then searching
in
both the default text field and a couple of columns. Works fine on in the
browser.
But if I query for it using curl in PHP, there's
HI ,
On Wed, Dec 15, 2010 at 2:52 PM, Dennis Gearon gear...@sbcglobal.netwrote:
I finally figured out how to use curl to GET results, i.e. just turn all
spaces
into '%20' in my type of queries. I'm using solar spatial, and then
searching in
both the default text field and a couple of
Forgive me if this seems like a dumb question but have you tried the
Apache_Solr_Service class?
http://www.ibm.com/developerworks/library/os-php-apachesolr/index.html
It's really quite good at handling the nuts and bolts of making the HTTP
requests and decoding the responses for PHP. I almost
On Wed, Dec 15, 2010 at 3:09 AM, Jan Høydahl / Cominvent
jan@cominvent.com wrote:
Any way to disable TF/IDF normalization without also disabling positions?
see Similarity.tf(float) and Similarity.tf(int)
if you want to change this for both terms and phrases just override
Hello,
According to the wiki http://wiki.apache.org/solr/LanguageAnalysis,
the light stemmers for French (solr.FrenchLightStemFilterFactory and
solr.FrenchMinimalStemFilterFactory) are only available for SOLR 3.1.
Is there a way to make them work with 1.4.1?
- - -
Additionally, there is an
One oddity is the duplicated sections:
arr name=facet.pivot
strroot_category_name,parent_category_name,category/str
strroot_category_id,parent_category_id,category_id/str
/arr
That's in your responseHeader twice. Perhaps something fishy caused from that?
Is this hardcoded in your
Did you try with filterquery?
Andrea Gazzarini
-Original Message-
From: sara motahari saramotah...@yahoo.com
Date: Tue, 14 Dec 2010 17:34:52
To: solr-user@lucene.apache.org
Reply-To: solr-user@lucene.apache.org
Subject: limit the search results to one category
Hi all,
I am using a
2010/12/15 Emmanuel Bégué medu...@gmail.com:
Hello,
According to the wiki http://wiki.apache.org/solr/LanguageAnalysis,
the light stemmers for French (solr.FrenchLightStemFilterFactory and
solr.FrenchMinimalStemFilterFactory) are only available for SOLR 3.1.
Is there a way to make them work
Hi,
we're looking for some comparison-benchmarks for importing large tables from a
mysql database (full import).
Currently, a full-import of ~ 8 Million rows from a MySQL database takes around
3 hours, on a QuadCore Machine with 16 GB of
ram and a Raid 10 storage setup. Solr is running on a
Thanks All,
Testing here shortly and will report back asap.
w/r,
Adam
On Wed, Dec 15, 2010 at 4:10 AM, Savvas-Andreas Moysidis
savvas.andreas.moysi...@googlemail.com wrote:
Hi Adam,
we are using DIH to index off an SQL Server database(the freeby SQLExpress
one.. ;) ). We have defined the
Hallo Users,
I habve a Problem wit Solr 1.4.1 on Ubuntu 10.10
I have download the new version and extract it!
than i have copy the solr.xml from example/multicore/solr.xml to
/examples/solr/solr.xml
?xml version=1.0 encoding=UTF-8 ?
!--
Licensed to the Apache Software Foundation (ASF) under
What version of Solr are you using?
Adam
2010/12/15 Robert Gründler rob...@dubture.com
Hi,
we're looking for some comparison-benchmarks for importing large tables
from a mysql database (full import).
Currently, a full-import of ~ 8 Million rows from a MySQL database takes
around 3 hours,
You're adding on the order of 750 rows (docs)/second, which isn't bad...
have you profiled the machine as this runs? Even just with top (assuming
unix)...
because the very first question is always what takes the time, getting
the data from MySQL or indexing or I/O?.
If you aren't maxing out your
What version of Solr are you using?
Solr Specification Version: 1.4.1
Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42
Lucene Specification Version: 2.9.3
Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55
-robert
Adam
2010/12/15 Robert Gründler
We are currently running Solr 4.x from trunk.
-d64 -Xms10240M -Xmx10240M
Total Rows Fetched: 24935988
Total Documents Skipped: 0
Total Documents Processed: 24568997
Time Taken: 5:55:19.104
24.5 Million Docs as XML from filesystem with less than 6 hours.
May be your MySQL is the bottleneck?
2010/12/15 Robert Gründler rob...@dubture.com:
The data-config.xml looks like this (only 1 entity):
entity name=track query=select t.id as id, t.title as title, l.title
as label from track t left join label l on (l.id = t.label_id) where
t.deleted = 0 transformer=TemplateTransformer
Hi Jörg,
I think the first thing you should check is your Ubuntu's encoding, second
one is file permissions (BTW why are you sudoing?).
Did you try using the bash script under example/exampledocs named post.sh
(use it like this: 'sh post.sh *.xml')
Cheers,
Tommaso
2010/12/15 Jörg Agatz
i've benchmarked the import already with 500k records, one time without the
artists subquery, and one time without the join in the main query:
Without subquery: 500k in 3 min 30 sec
Without join and without subquery: 500k in 2 min 30.
With subquery and with left join: 320k in 6 Min 30
so
Hi all,
I'm currently using Solr and I've got a question about filtering on a lower
level than filter queries.
We want to be able to restrict the documents that can possibly be returned to a
users query. From another system we'll get a list of document unique ids for
the user which is all the
On Wed, Dec 15, 2010 at 9:49 AM, Michael Owen michaelowe...@hotmail.com wrote:
I'm currently using Solr and I've got a question about filtering on a lower
level than filter queries.
We want to be able to restrict the documents that can possibly be returned to
a users query. From another
It might not be practical in your case, but is it possible to get from that
other system, a list of ids the user is *not* allow to see and somehow
invert the logic in the filter?
Regards,
-- Savvas.
On 15 December 2010 14:49, Michael Owen michaelowe...@hotmail.com wrote:
Hi all,
I'm
The custom import I wrote is a java application that uses the SolrJ
library. Basically, where I had sub-entities in the DIH config I did
the mappings inside my java code.
1. Identify a subset or chunk of the primary id's to work on (so I
don't have to load everything into memory at once) and put
Hi,
Please give me advise how to create custom scoring. I need to result that
documents were in order, depending on how popular each term in the document
(popular = how many times it appears in the index) and length of the
document (less terms - higher in search results).
For example, index
That was a quick response Steve!
Sounds all great! Much appreciated. Definitely think specifying a bit filter is
something that many people many find useful.
I'll have a look at Solr-2052 too.
Thanks again,
Mike
Date: Wed, 15 Dec 2010 09:57:54 -0500
Subject: Re: Lower level filtering
From:
Good point - though the inverse could be true where only a few documents is
allowed and then a big list still exists. Even in the middle ground, its still
going to be a long list of thousands.
Thanks
Mike
Date: Wed, 15 Dec 2010 14:58:33 +
Subject: Re: Lower level filtering
From:
Here's the problem with what you're outlining:
Solr/Lucene doc ids are NOT invariant, so
the doc IDs you get from the other system
will not be directly usable by in the filter. But
assuming the other system stores what you've
defined as uniqueKey you could walk the
index and get the doc IDs from
Hi again,
let's say you have 2 solr Instances, which have both exactly the same
configuration (schema, solrconfig, etc).
Could it cause any troubles if we import an index from a SQL database on solr
instance A, and copy the whole
index to the datadir of solr instance B (both solr instances run
On 12/15/2010 10:05 AM, Robert Gründler wrote:
Hi again,
let's say you have 2 solr Instances, which have both exactly the same
configuration (schema, solrconfig, etc).
Could it cause any troubles if we import an index from a SQL database on solr
instance A, and copy the whole
index to the
thanks for your feedback. we can shutdown both solr servers for the time of the
copy-process, and both
solr instances run the same version, so we should be ok.
i'll let you know if we encounter any troubles.
-robert
On Dec 15, 2010, at 18:11 , Shawn Heisey wrote:
On 12/15/2010 10:05 AM,
Hi all,
I've just noticed a strange behavior (or, at least, I didn't expect that),
when adding useless parenthesis to a query.
Using the lucene query parser in Solr I get no results with the query:
* ((( NOT (text:something))) AND date = 2010-12-15) *
while I get the expected results when the
I want to just pass the JSON through after qualifying the user's access to the
site.
Didn't want to spend the horse power to receive it as PHP array syntax, run the
risk of someone putting bad stuff in the contents and running 'exec()' on it,
and then spending the extra horsepower to putput
Hi everyone,
does the solr.TrieDateField support dates BC?
I indexed negative dates and I'm able to query them,
but if I store them, they show up as postitive dates.
Thanks
Matthias
just making sure that you're aware of the built-in replication:
http://wiki.apache.org/solr/SolrReplication
can pull the indexes, along with config files.
cheers,
rob
2010/12/15 Robert Gründler rob...@dubture.com:
Hi again,
let's say you have 2 solr Instances, which have both exactly
Thanks Pankaj - that was useful to know. I havent used the query stuff
before for facets .. so that was good to know .. but the problem is still
there because I want the hierarchical counts which is exactly what
facet.pivot does ..
so e.g. i want to count for fieldC within fieldB and even fieldB
Hi
You could use Solr's php serialized object output (wt=phps) and then convert
it to json in your php:
?php
echo json_encode(unserialize($results_from_solr));
?
Regards
Andrew McCombe
On 15 December 2010 17:49, Dennis Gearon gear...@sbcglobal.net wrote:
I want to just pass the JSON through
The GeoDistanceComponent triggers the problem. It may be an issue in the
component but it could very well be a Solr issue. It seems you missed a very
recent thread on this one.
https://issues.apache.org/jira/browse/SOLR-2278
I finally figured out how to use curl to GET results, i.e. just turn
I experienced this on an EmbeddedSolrServer which was running behind a
tomcat process. After restarting the tomcat process 2-3 times (implying this
also recreates the SolrServer every time as well) this issue went away but I
don't know why it ever started. It looked like the searcher shutdown was
All,
I have successfully indexed a single entity but when I try multiple entities
is the second is skipped all together. Is there something wrong with my
config file?
?xml version=1.0 encoding=utf-8 ?
dataConfig
dataSource type=JdbcDataSource
mission.id and event.id if the same value will be overwriting the indexed
document. your ids need to be unique across all documents. i usually have a
field id_original that i map the table id to, and then for id per entity i
usually prefix it with the entity name in the value mapped to the
: does the solr.TrieDateField support dates BC?
: I indexed negative dates and I'm able to query them,
: but if I store them, they show up as postitive dates.
Hmm... definitely seems to be a bug.
I *think* this is another manifestation of SOLR-1899 (because of how the
hokey formatting code
Have a look at http://lucene.apache.org/java/3_0_2/scoring.html on how Lucene's
scoring works. You can override the Similarity class in Solr as well via the
schema.xml file.
On Dec 15, 2010, at 10:28 AM, Pavel Minchenkov wrote:
Hi,
Please give me advise how to create custom scoring. I
Lucid Imagination is pleased to announce the general availability of our Apache
Solr/Lucene powered LucidWorks Enterprise (LWE). LWE is designed to make it
easier for people to get up to speed on search by providing easier management,
integration with libraries commonly used in building search
: I am trying to debug my queries and see how scoring is done. I have 6 cores
and
: send the quesy to 6 shards and it's dismax handler (with search on various
: fields with different boostings). I enable debug, and view source but I'm
unable
: to see the explanations. I'm returning ID and
: Subject: limit the search results to one category
: References: 427522.34555...@web52907.mail.re2.yahoo.com
: 930238.38683...@web51308.mail.re2.yahoo.com
: In-Reply-To: 930238.38683...@web51308.mail.re2.yahoo.com
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing
: SimplePostTool: FATAL: Solr returned an error:
:
Unexpected_character_m_code_109_in_prolog_expected___at_rowcol_unknownsource_11
if you look at your solr log (or the HTTP response body, SimplePostTool
only gives you the status line) you'll see the more human readable form of
that error
Congrats!
A couple questions:
1) Which version of Solr is this based on?
2) How is LWE different from standard Solr? How should one choose between the
two?
Thanks.
--- On Wed, 12/15/10, Grant Ingersoll gsing...@apache.org wrote:
From: Grant Ingersoll gsing...@apache.org
Subject: [ANN]
Ahhh...I found that I did not set a dataSource name and when I did that and
then referred each entity to that dataSource all went according to plan ;-)
?xml version=1.0 encoding=utf-8 ?
dataConfig
dataSource type=JdbcDataSource
name=bleh
: This is a fairly basic synonyms question: how does synonyms handle stemming?
it's all a question of how your analysis chain is configured forh te field
type.
if you have your stemming filter before your synonyms filter, then the
synonyms.txt file needs to map the *stems* of hte synonyms.
: One of our developers had initially tried swapping solr cores (e.g. core0
: and core1) using the solrj api, but it failed. (don't have the exact error)
: He susequently replaced the call with straight http (i.e. http client).
:
: Unfortunately I don't have the exact error in front of me...
I will look into the security and processor power implications of that. Good
idea, thx.
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better
idea to learn from others’ mistakes, so you do not have to make them
It's been working for me. One thing to look out for might be the url
you're using in SolrUtil.getSolrServer()? The url you use for
reindexing won't be the same as the one you use to swap cores. Make
sure it's using admin/cores and not production/admin/cores or
reindex/admin/cores.
Sorry if this
Hello all,
Are there any general guidelines for determining the main factors in memory use
during merges?
We recently changed our indexing configuration to speed up indexing but in the
process of doing a very large merge we are running out of memory.
Below is a list of the changes and part of
well, it was three problems:
1/ I was saving the file as a 'complete web page', uknowingly, from firefox.
2/ I had a small message for troubleshooting being spit out after the json.
3/ My partner had output all the spatial solr 'tiers' information, and there's
a
binary value in there that stops
Hi Christopher,
One option comes to mind: shingles?
I have not done anything with them yet, but that is on my radar for
sometime about a month out. Speaking unencumbered by experience or
substantial understanding, my guess is that shingles would be great for
you if you can select
I just want to say that this list serve has been invaluable to a newbie like
me ;-) I posted a question earlier today and literally 10 minutes later I
got an answer that helped me solve my problem. This is proof that there is a
experienced and energetic community behind this FOSS group of projects
Can you do just one join in the top-level query? The DIH does not have
a batching mechanism for these joins, but your database does.
On Wed, Dec 15, 2010 at 7:11 AM, Tim Heckman theck...@gmail.com wrote:
The custom import I wrote is a java application that uses the SolrJ
library. Basically,
59 matches
Mail list logo