Sorry being unclear and thank you for answering.
Consider the following documents A(k0,k1,k2), B(k1,k2,k3), and C(k0,k2,k3),
where A,B,C are document identifiers and the ks in bracket with each are the
terms each contains.
So Solr inverted index should be something like:
k0 -- A | C
k1 -- A | B
k0 -- A | C
k1 -- A | B
k2 -- A | B | C
k3 -- B | C
Now let q=k1, how do I make sure C doesn't appear as a result since it
doesn't contain any occurence of k1?
Do we bother to do that. Now that's what lucene does :)
--
View this message in context:
On Tue, Jun 7, 2011 at 8:43 AM, pravesh suyalprav...@yahoo.com wrote:
k0 -- A | C
k1 -- A | B
k2 -- A | B | C
k3 -- B | C
Now let q=k1, how do I make sure C doesn't appear as a result since it
doesn't contain any occurence of k1?
Do we bother to do that. Now that's what lucene does :)
thanks Jayendra..
From: Jayendra Patil jayendra.patil@gmail.com
To: solr-user@lucene.apache.org
Sent: Tue, 7 June, 2011 6:55:58 AM
Subject: Re: Master Slave help
Do you mean the replication happens everytime you restart the server ?
If so, you would need
Hi,
My commit seems to be taking too much time, if you notice from the Dataimport
status given below to commit 1000 docs its taking longer than 24 minutes
/lst
str name=statusbusy/str
str name=importResponseA command is still running.../str
−
lst name=statusMessages
str name=Time
Hi
We are using requestextractinghandler and we are getting following error. we
are giving microsoft docx file for indexing.
I think that this is something to do with field date definition .. but now
very sure ...what field type should we use?
2. we are trying to index jpg (when we search over
Hello,
I have a SOLR implementation with 1m products. Every products has some
information, lets say a television has some information about pixels and
inches, a computer has information about harddisk, cpu, gpu. When a user
search for computer i want to show the correct facets. An example:
User
Hi,
I need to use the function queries operations with the score of a given
query, but only in the docset that i get from the query and i dont know if
this is possible.
Example:
q=shops in madridreturns 1 docs with a specific score for each doc
but now i need to do some stuff like
I have a need to index an internal instance of Mediawiki. I'd like to
use DIH if I can since I have access to the database but the example
provided on the Solr wiki uses a Mediawiki dump XML file.
Does anyone have any experience using DIH in this manner? Am I barking
up the wrong tree and
As per the subject I am getting java.lang.NoClassDEfFoundError
org/carrot2/core/ControllerFactory
when I try to run clustering.
I am using Solr 3.1:
I get the following error:
java.lang.NoClassDefFoundError: org/carrot2/core/ControllerFactory
at
Created file, reloaded solr - externalfilefield works fine, if i
change change external files and do curl
http://127.0.0.1:4900/solr/site/update -H Content-Type: text/xml
--data-binary 'commit /'
then no thanges are made. If i start solr without external files
Gabriele
Lucene uses a combination of boolean and VSM for its IR.
A straight forward query for a keyword will only match docs with that keyword.
Now things quickly get subtle and complex the more sugar you add, more
complicated queries across fields and more complex
analysis chains but I think
I added the following to my configuration
lib dir=c:/projects/solrtest/dist/
regex=apache-solr-clustering-.*\.jar /
requestHandler name=clusty class=solr.SearchHandler default=true
lst name=defaults
str name=echoParamsexplicit/str
bool name=clusteringtrue/bool
str
Are you optimizing? That is unnecessary when committing, and is often the
culprit.
Best
Erick
On Tue, Jun 7, 2011 at 5:42 AM, Rohit Gupta ro...@in-rev.com wrote:
Hi,
My commit seems to be taking too much time, if you notice from the Dataimport
status given below to commit 1000 docs its
how this method
(http://localhost:8983/solr/select?shards=*Machine:Port/Solr
Path,**Machine:Port/Solr Path*indent=trueq=query)
is better than zooKeeper, could you please refer any performance doc.
On 7 June 2011 08:18, bmdakshinamur...@gmail.com bmdakshinamur...@gmail.com
wrote:
Instead of
As I may have mentioned before, VuFind is actually doing two Solr queries for
every search -- a base query that gets basic spelling suggestions, and a
supplemental spelling-only query that gets shingled spelling suggestions. If
there's a way to get two different spelling responses in a single
Hello,
I have some problems with the installation of the new PECL package
solr-1.0.1.
I run this command:
pecl uninstall solr-beta ( to uninstall old version, 0.9.11)
pecl install solr
The installing is running but then it gives the following error message:
Finally figured out the problem.
--
View this message in context:
http://lucene.472066.n3.nabble.com/java-lang-AbstractMethodError-at-org-apache-solr-handler-ContentStreamHandlerBase-handleRequestBody--tp3026470p3034456.html
Sent from the Solr - User mailing list archive at Nabble.com.
I am currently experimenting with the Solr Cloud code on trunk and just had
a quick question. Lets say my setup had 3 nodes a, b and c. Node a has
1000 results which meet a particular query, b has 2000 and c has 3000. When
executing this query and asking for row 900 what specifically happens?
On Tue, Jun 7, 2011 at 9:35 AM, Jamie Johnson jej2...@gmail.com wrote:
I am currently experimenting with the Solr Cloud code on trunk and just had
a quick question. Lets say my setup had 3 nodes a, b and c. Node a has
1000 results which meet a particular query, b has 2000 and c has 3000.
One way is to use the boost qparser:
http://search-lucene.com/jd/solr/org/apache/solr/search/BoostQParserPlugin.html
q={!boost b=productValueField}shops in madrid
Or you can use the edismax parser which as a boost parameter that
does the same thing:
defType=edismaxq=shops in
Thanks, but its not what i'm looking for, because the BoostQParserPlugin
multiplies the score of the query with the function queries defined in the b
param of the BoostQParserPlugin. and i can't use the edismax because we have
our own qparser. Its seems that i have to code another qparser.
Demian,
If you omit spellcheckIndexDir from the configuration, it will create an
in-memory spelling dictionary.
James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311
-Original Message-
From: Demian Katz [mailto:demian.k...@villanova.edu]
Sent: Tuesday, June 07, 2011
Hi Yonik,
thanks, it's working in trunk now again... I had to re-index though
because of exceptions at startup, did the index format change again
between trunk of beginning / mid may and current trunk?
best regards,
Stefan
Am 03.06.2011 15:32, schrieb Yonik Seeley:
This bug was introduced
OK... The fix I thought would fix it didn't fix it (which was to use the
commitWithin feature). What I can gather from `ps` is that the thread has pages
locked in memory. Currently I'm using native locking for Solr. Would switching
to simple help alleviate this problem?
Chris
On Jun 4, 2011,
I feel like this should be fairly easy to do but I just don't see anywhere
in the documentation on how to do this. Perhaps I am using the wrong search
parameters.
On Mon, Jun 6, 2011 at 12:19 PM, Brian Lamb
brian.l...@journalexperts.comwrote:
Hi all,
Is it possible to change the query parser
Hey there. I was wondering if Solr can be embedded into my Java Web App. As
far as I know, Solr comes as a war or bundled with Jetty if you don't have a
container. I've opened the war's web.xml and found out that it only has a
couple of servlets, filters and that's it.
So, would it be possible to
Um, normally that would never happen, because, well, like you say, the
inverted index doesn't have docC for term K1, because doc C didn't
include term K1.
If you search on q=K1, then how/why would docC ever be in your result
set? Are you seeing it in your result set? The question then would
Nope, not possible.
I'm not even sure what it would mean semantically. If you had default
operator OR ordinarily, but default operator AND just for field2,
then what would happen if you entered:
field1:foo field2:bar field1:baz field2:bom
Where the heck would the ANDs and ORs go? The
Hi Federico, you can take a look to this wiki page:
http://wiki.apache.org/solr/EmbeddedSolr
http://wiki.apache.org/solr/EmbeddedSolrSolr also has some maven support,
see the ant target generate-maven-artifacts, don't know if that's what you
need.
Regards,
Tomás
On Tue, Jun 7, 2011 at 12:17 PM,
You are right, Lucene will return based on my scoring function
implementation (Similarity
classhttp://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html
):
score(q,d) =
Okay, if you're using a custom similarity, I'm not sure what's going on,
I'm not familiar with that.
But ordinarily, you are right, you would require k1 with +k1.
What you say about the + being lost suggests something is going wrong.
Either you are not sending your query to Solr properly
Hi all,
I have a problem with my index. Even though I always index the same
data over and over again, whenever I try
a couple of searches (they are always the same as they are issued by a
unit test suite) I do not get the same
results, sometimes I get 3 successes and 2 failures and sometimes it
Hello!
My problem is as follows: I've got a field (indexed and stored setted as
true) tokenized by whitespaces and other patterns, with a gap with value
100. For example, if index the following expression for the field that I
mentioned:
*Expression*: A B C D E- *Index*: tokenA
Thanks Yonik. I have a follow on now, how does Solr ensure consistent
results across pages? So for example if we had my 3 theoretical solr
instances again and a, b and c each returned 100 documents with the same
score and the user only requested 100 documents, how are those 100 documents
chosen
Hi Jonathan,
Thank you for your reply. Your point about my example is a good one. So let
me try to restate using your example. Suppose I want to apply AND to any
search terms within field1.
Then
field1:foo field2:bar field1:baz field2:bom
would by written as
My first guess would be that you are using AND as default operator? you can
see the generated query by using the parameter debugQuery=true
On Tue, Jun 7, 2011 at 1:34 PM, Luis Cappa Banda luisca...@gmail.comwrote:
Hello!
My problem is as follows: I've got a field (indexed and stored setted as
On Tue, Jun 7, 2011 at 1:01 PM, Jamie Johnson jej2...@gmail.com wrote:
Thanks Yonik. I have a follow on now, how does Solr ensure consistent
results across pages? So for example if we had my 3 theoretical solr
instances again and a, b and c each returned 100 documents with the same
score and
There's no feature in Solr to do what you ask, no. I don't think.
On 6/7/2011 1:30 PM, Brian Lamb wrote:
Hi Jonathan,
Thank you for your reply. Your point about my example is a good one. So let
me try to restate using your example. Suppose I want to apply AND to any
search terms within field1.
On Tue, Jun 7, 2011 at 12:34 PM, Luis Cappa Banda luisca...@gmail.com wrote:
*Expression*: A B C D E F G H I
As written, this is equivalent to
*Expression*: A default_field:B default_field:C default_field:D
default_field:E default_field:F default_field:G default_field:H
default_field:I
Try
I have a solr cloud setup wtih 2 servers, when executing a query against
them of the form:
I have a field defined as:
field name=content type=text indexed=true stored=false
termVectors=true multiValued=true /
where text is unmodified from the schema.xml example that came with Solr
1.4.1.
I have documents with some compound words indexed, words like Sandstone. And
in several cases
catenateWords should be set to true. Same goes for the index analyzer.
preserveOriginal would also work.
I have a field defined as:
field name=content type=text indexed=true stored=false
termVectors=true multiValued=true /
where text is unmodified from the schema.xml example that came
Hi all,
we're using solr 1.4 and external file field ([1]) for sorting our
searchresults. We have about 40.000 Terms, for which we use this sorting option.
Currently we're running into massive OutOfMemory-Problems and were not pretty
sure, what's the matter. It seems that the garbage collector
Hi,
I am very new to Solr and my client is trying to implement full text
searching capabilities to their product by using Solr. They will also have
master storage that would be the Authoritative data store which will also
provide meta data searches. Can you please point me in the right
Well, this is odd. Several questions
1 what do your logs show? I'm wondering if somehow some data is getting
rejected. I have no idea why that would be, but if you're seeing indexing
exceptions that would explain it.
2 on the admin/stats page, are maxDocs and numDocs the same in the
WordDelimiterFilterFactory is doing this to you. It's not clear to me that you
want this in place at all.
Look at admin/analysis for that field to see how that filter breaks things up,
it's often surprising to people.
Best
Erick
On Tue, Jun 7, 2011 at 3:13 PM, kenf_nc ken.fos...@realestate.com
see
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
from the wiki
Example of generateWordParts=1 and catenateWords=1:
PowerShot - 0:Power, 1:Shot 1:PowerShot
(where 0,1,1 are token positions)
A's+B'sC's - 0:A, 1:B, 2:C, 2:ABC
I tried setting catenateWords=1 on the Query analyzer and that didn't do
anything. I think what I need is to set my Index Analyzer to have
preserveOriginal=1 and then re-index everything. That will be a pain, so
I'll do a small test to make sure first. I'm really surprised
preserveOriginal=1 isn't
Hi Brian could your front end app do this field query logic?
(assuming you have an app in front of solr)
On 7 June 2011 18:53, Jonathan Rochkind rochk...@jhu.edu wrote:
There's no feature in Solr to do what you ask, no. I don't think.
On 6/7/2011 1:30 PM, Brian Lamb wrote:
Hi Jonathan,
Hi,
I¹m having some troubles using Solr throught Coldfusion, the problem right
now is that when I search for a term in a Custom field, the results
sometimes have the value that I sent to the custom field and not to the
field that contains the text, this is the cfsearch sintax that I¹m using:
Can you see the query actually presented to solr in the logs ?
maybe capture that and then run it with a debug true in the admin pages.
sorry i cant help directly with your syntax
On 7 June 2011 23:06, Alejandro Delgadillo adelgadi...@febg.org wrote:
Hi,
I¹m having some troubles using Solr
You must catenateWord on index-time as well.
I tried setting catenateWords=1 on the Query analyzer and that didn't do
anything. I think what I need is to set my Index Analyzer to have
preserveOriginal=1 and then re-index everything. That will be a pain, so
I'll do a small test to make sure
Hello,
I am testing solr 3.2 and have problems with wildcards.
I am indexing values like IA 300; IC 330; IA 317; IA 318 in a field GOK,
and can't find a way to search with wildcards.
I want to use a wild card search to match something like IA 31? but cannot
find a way to do so.
GOK:IA\ 38*
Thanks Lee for the quick response,
Let me explain it a little bit better
In the CFSEARCH tag, you use the CRITERIA attribute, what it does... By
default is that it sents to the SOLR via post the search query of the user
to the field where the text is stored in this case since I'm indexing PDF
Yes there is, but you haven't provided enough information to
make a suggestion. What isthe fieldType definition? What is
the field definition?
Two resources that'll help you greatly are:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
and the admin/analysis page...
Best
Erick
On
Hello,
What are the biggest document fields that you've ever indexed in Solr or that
you've heard of? Ah, it must be Tom's Hathi trust. :)
I'm asking because I just heard of a case of an index where some documents
having a field that can be around 400 MB in size! I'm curious if anyone has
From older (2.4) Lucene days, I once indexed the 23 volume Encyclopedia
of Michigan Civil War Volunteers in a single document/field, so it's probably
within the realm of possibility at least G...
Erick
On Tue, Jun 7, 2011 at 6:59 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
Hello,
I think the question is strange... May be you are wondering about possible
OOM exceptions? I think we can pass to Lucene single document containing
comma separated list of term, term, ... (few billion times)... Except
stored and TermVectorComponent...
I believe thousands companies already indexed
Hi,
I think the question is strange... May be you are wondering about possible
OOM exceptions?
No, that's an easier one. I was more wondering whether with 400 MB Fields
(indexed, not stored) it becomes incredibly slow to:
* analyze
* commit / write to disk
* search
I think we can pass to
Hi Otis,
I am recalling pagination feature, it is still unresolved (with default
scoring implementation): even with small documents, searching-retrieving
documents 1 to 10 can take 0 milliseconds, but from 100,000 to 100,010 can
take few minutes (I saw it with trunk version 6 months ago, and
The Salesforce book is 2800 pages of PDF, last I looked.
What can you do with a field that big? Can you get all of the snippets?
On Tue, Jun 7, 2011 at 5:33 PM, Fuad Efendi f...@efendi.ca wrote:
Hi Otis,
I am recalling pagination feature, it is still unresolved (with default
scoring
Hi Otis,
Our OCR fields average around 800 KB. My guess is that the largest docs we
index (in a single OCR field) are somewhere between 2 and 10MB. We have had
issues where the in-memory representation of the document (the in memory index
structures being built)is several times the size of
Hi Can somebody answer this ...
3. can somebody tell me an idea how to do indexing for a zip file ?
1. while sending docx, we are getting following error.
java.lang.
NumberFormatException: For input string: quot;2011-01-27T07:18:00Zquot;
at
64 matches
Mail list logo