Generally speaking, by convention boosts in Lucene have unity at 1.0,
not 0.0. So, a negative boost is usually done with boosts between 0
and 1. For this case, maybe a boost of 0.1 is what you want?
I forgot to say I tried what you say aswell but didn't work.
In the standard query parser,
Hi,
I am newbie to solr and exploring solr last few days.
I am using solr cell with tika for parsing, indexing and searching
Posting the rich text documents via Solrj.
My actual requirement is instead of using local documents(pdf, doc docx),
i want to use webpages(urls for
On Wed, Feb 3, 2010 at 12:21 AM, Charlie Jackson charlie.jack...@cision.com
wrote:
Currently, I've got a Solr setup in which we're distributing searches
across two cores on a machine, say core1 and core2. I'm toying with the
notion of enabling Solr's HTTP caching on our system, but I noticed
Thanks, but still no luck with that:
*:* AND -fieldX:[* TO *] - returns 0 docs
fieldX:(a*) - return docs, so I'm sure that there's docs with this field filled.
Any other ideias what could be wrong?
Frederico
-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent:
I tried another one:
fieldX:[ TO *] and it returns articles with the field filled :), so I guess
I'm getting there.
But I tried also fieldX:[ TO *] and get a few more results that the first
one...
Is there a real difference between these, and also if the results are really
all docs with
'!'
:)))
Plus, FastLRUCache (previous one was synchronized)
(and of course warming-up time) := start complains after ensuring there are
no complains :)
(and of course OS needs time to cache filesystem blocks, and Java HotSpot,
... - few minutes at least...)
On Feb 3, 2010, at 1:38 PM, Rajat
Before you file a JIRA issue:
I don't believe this is a bug, so there is likely no need for JIRA. Try
putting the date.formats snipped in the defaults section rather than
simply within the RequestHandler tags. Then you should be good to go.
--
- Mark
http://www.lucidimagination.com
Lance
Not entirely true - thats the case in Lucene, but in Solr, top level
queries *can* start with minus or not. They cannot if they are nested.
Both
*:* AND -fieldX:[* TO *]
and
-fieldX:[* TO *]
are the same in Solr.
--
- Mark
http://www.lucidimagination.com
Lance Norskog wrote:
Queries
*:* AND -fieldX:[* TO *] - returns 0 docs
fieldX:(a*) - return docs, so I'm sure that there's docs
with this field filled.
Any other ideias what could be wrong?
There is not wrong in this scenario.
If -fieldX:[* TO *] returns 0 docs, it means that all of your documents have
that fieldX
Hello All,
I am trying to start Solr server using Jetty ( same as in
Solr tutorial in their website ). As the index size is around 3.5gb
its returning OutOfMemoryError. Is it mandatory to satisfy the
condition java heap size index size ? . If yes, is there any
solution to run Solr
Hello,
We are using Solr(v 1.3.0 694707 with Lucene version 2.4-dev 691741)
in multicore mode with an average of 400 indexes (all indexes have the
same structure).
These indexes are stored on a nfs disk.
A java process writes continuously in these indexes while solr is only
used to read
That's correct.
If u want to find Missing Values
ie fields for whom value is not present then u will use -
Ankit
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Thursday, February 04, 2010 9:41 AM
To: solr-user@lucene.apache.org
Subject: RE: query all filled
There might be an OCR plugin for Apache Tika (which does exactly this out of
the box except for OCR capability, i believe).
http://lucene.apache.org/tika/
-mike
2010/2/4 Kranti™ K K Parisa kranti.par...@gmail.com
Hi,
Can anyone list the best OCR APIs available to use in combination with
Hi list,
I'm using the ExtractingRequestHandler to extract content from
documents. It's extracting the last_modified field quite fine, but of
course only for documents where this field is set. If this field is not
set I want to pass the file system timestamp of the file.
I'm doing:
final
Hi everyone,
I am currently trying to set up JMX support for Solr, but somehow the
listening socket is not even created on my specified port.
My parameters look like this (running the Solr example):
java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=6060
Cool, this way it's no longer crashing.
Thanks and Regards,
Chris
Am 04.02.2010 14:29, schrieb Mark Miller:
Before you file a JIRA issue:
I don't believe this is a bug, so there is likely no need for JIRA. Try
putting the date.formats snipped in the defaults section rather than
simply
I understand that upon performing an index (full-import or delta-import), the
dataimport.properties file is written to with a last_index_time which can
then be accessed by the data-config.xml for delta-import queries with
${dataimporter.last_index_time}.
I was curious if another key could be
Hi folks,
as we're moving to solr 1.4 replication, i want to know about backups.
Questions
-
1. Properties that can be set to configure this feature (only know
backupAfter)
2. Is it an incremental backup or a full index snapshoot?
Thx
--
Lici
~Java Developer~
yes tika indexes all formats.
but i am specifically looking for OCR (thru java) atleast for PDF or JPEG
images
any clues?
Best Regards,
Kranti K K Parisa
On Thu, Feb 4, 2010 at 8:29 PM, mike anderson saidthero...@gmail.comwrote:
There might be an OCR plugin for Apache Tika (which does
Hi,
I'm having some troubles getting this to work on a snapshot from 3rd feb My
config looks as follows
dataSource name=ora driver=oracle.jdbc.OracleDriver url= /
datasource name=orablob type=FieldStreamDataSource /
document name=mydoc
entity dataSource=ora name=meta
Christoph Brill wrote:
Cool, this way it's no longer crashing.
Thanks and Regards,
Chris
Am 04.02.2010 14:29, schrieb Mark Miller:
Before you file a JIRA issue:
I don't believe this is a bug, so there is likely no need for JIRA. Try
putting the date.formats snipped in the defaults
Good job Mark, works fine and does not keep my files open.
Thanks,
Chris
Am 03.02.2010 15:24, schrieb Mark Miller:
Hey Christoph,
Could you give the patch at
https://issues.apache.org/jira/browse/SOLR-1744 a try and let me know
how it works out for you?
Mark Miller wrote:
Christoph Brill wrote:
Cool, this way it's no longer crashing.
Thanks and Regards,
Chris
Am 04.02.2010 14:29, schrieb Mark Miller:
Before you file a JIRA issue:
I don't believe this is a bug, so there is likely no need for JIRA. Try
putting the
Theoretically yes,it's correct, but i have about 1/10 of the docs with
this field not empty and the rest is empty.
Most of the articles have the field empty as I can see when query *:*.
So the queries don't make sense...
-Original Message-
From: Ankit Bhatnagar
Looks like it works. No crashes and the logs states it was added. I
didn't test against acutal data, though.
04.02.2010 17:14:13
org.apache.solr.handler.extraction.ExtractingRequestHandler inform
INFO: Adding Date Format: -MM-dd HH:mm:ss
04.02.2010 17:14:13
Hi,
I have some common stopwords defined like [a,the,of] etc. Our users need the
ability to include stopwords in their search. I tried using + sign like,
[Bank +of America] to get accurate results, but it does not work.
Does any body know how to provide this ability to search for stopwords - we
Hi,
In our indexes, sometimes we have some documents written in other languages
different to the most common index's language. Is there any way to give less
boosting to this documents?
Thanks in advance,
Raimon Bosch.
--
View this message in context:
Hi,
I have some common stopwords defined like [a,the,of] etc.
Our users need the
ability to include stopwords in their search. I tried using
+ sign like,
[Bank +of America] to get accurate results, but it does not
work.
Does any body know how to provide this ability to search
for
Heya -
So we just upgraded our Solr install to 1.4, and there's a great CPU drop
and query response time drop. Good!
But we're seeing the slowdown in the collection of statistics (stats.jsp)
mentioned here:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg30224.html
to the tune of
I've made a backup request to my local solr server, it works but .. can i
set snapshoots dir path?
El 4 de febrero de 2010 16:54, Licinio Fernández Maurelo
licinio.fernan...@gmail.com escribió:
Hi folks,
as we're moving to solr 1.4 replication, i want to know about backups.
Questions
john allspaw wrote:
Heya -
So we just upgraded our Solr install to 1.4, and there's a great CPU drop
and query response time drop. Good!
But we're seeing the slowdown in the collection of statistics (stats.jsp)
mentioned here:
In our indexes, sometimes we have some documents written in
other languages
different to the most common index's language. Is there any
way to give less
boosting to this documents?
If you are aware of those documents, at index time you can boost those
documents with a value less than 1.0:
Does any body know how to provide this ability to
search for stopwords
CommonGramsFilterFactory [1] may help.
Sorry, Solr 1.4 has this filter.
XML update. I'm serializing the doc in .NET, and then using solsharp to
insert/update the doc to SOLR.
The result is:
doc
str name=fieldX/
/doc
Dows this means I'm adding a whitespace on XML Update?
Frederico
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
XML update. I'm serializing the doc
in .NET, and then using solsharp to
insert/update the doc to SOLR.
The result is:
doc
str name=fieldX/
/doc
Dows this means I'm adding a whitespace on XML Update?
Yes exactly. You can remove field name=fieldX /field from your
add
doc
...
Thank you for the responses!
-Original Message-
From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant
Ingersoll
Sent: Wednesday, February 03, 2010 1:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Guidance on Solr errors
Inline below.
On Feb 2, 2010, at 8:40 PM,
On Feb 4, 2010, at 12:38 AM, Lance Norskog wrote:
Queries that start with minus or NOT don't work. You have to do this:
*:* AND -fieldX:[* TO *]
That's only true for subqueries. A purely negative single top-level
clause works fine with Solr.
Erik
On Wed, Feb 3, 2010 at
I've analyzed my index application and checked the XML before executing the
http request and the field it's empty:
field name=fieldX /
It should be empty on SOLR.
Probably something in the way between my application (.NET) and the SOLR (Jetty
on Ubuntu) adds the whitespace.
Anyway, I'll try
Yes, It's true that we could do it in index time if we had a way to know. I
was thinking in some solution in search time, maybe measuring the % of
stopwords of each document. Normally, a document of another language won't
have any stopword of its main language.
If you know some external
is it possible to configure the distance formula used by fuzzy
matching? i see there are other under the function query page under
strdist but im wondering if they are applicable to fuzzy matching
thx much
--joe
I've analyzed my index application
and checked the XML before executing the http request and
the field it's empty:
field name=fieldX /
It should be empty on SOLR.
Probably something in the way between my application (.NET)
and the SOLR (Jetty on Ubuntu) adds the whitespace.
Hi,
I need to have Solr/Jetty running as a Windows Service.
I am using the Lucid distribution.
Does anyone have a running example and tool for this?
med venlig hilsen/best regards
Roland Villemoes
Tel: (+45) 22 69 59 62
E-Mail: mailto:r...@alpha-solutions.dk
Alpha Solutions A/S
Borgergade 2,
Levenstein algo is currently hardcoded (FuzzyTermEnum class) in Lucene 2.9.1
and 3.0...
There are samples of other distance in contrib folder
If you want to play with distance, check
http://issues.apache.org/jira/browse/LUCENE-2230
It works if distance is integer and follows metric space axioms:
Hi Everyone,
We are indexing quite a lot of data using update/csv handler. For
reasons I can't get into right now, I can't implement a DIH since I
can only access the DB using Stored Procs and stored proc support in
DIH is not yet available. Indexing takes about 3 hours and I don't
want to tax
: http://localhost:8080/solr/core1/select/?q=googlestart=0rows=10shards
: =localhost:8080/solr/core1,localhost:8080/solr/core2
: You are right, etag is calculated using the searcher on core1 only and it
: does not take other shards into account. Can you open a Jira issue?
...as a possible
On Thu, Feb 4, 2010 at 3:03 PM, Rohit Gandhe rohit.gan...@gmail.com wrote:
We are indexing quite a lot of data using update/csv handler. For
reasons I can't get into right now, I can't implement a DIH since I
can only access the DB using Stored Procs and stored proc support in
DIH is not yet
Thanks Yonik! We want to go to Index replication soon (couple of
months), which will also help with incremental updates. But for now we
want a quick and dirty solution without running two servers. Does the
utility look ok to index a CSV file? Is it safe to do in production
environment? I know
What about using Tomcat instead? Tomcat has Windows service
capability already, right?
Erik
On Feb 4, 2010, at 2:18 PM, Roland Villemoes wrote:
Hi,
I need to have Solr/Jetty running as a Windows Service.
I am using the Lucid distribution.
Does anyone have a running example and
On Thu, Feb 4, 2010 at 4:42 PM, Erik Hatcher erik.hatc...@gmail.com wrote:
What about using Tomcat instead? Tomcat has Windows service capability
already, right?
Another part of the problem is telling the solr webapp where it's solr home is.
Options:
- use a tomcat context fragment
Transferred partially to solr-user...
Steven, thanks for the reply!
I wonder if PatternReplaceFilter can output multiple tokens? I'd like
to progressively strip the non-alphanums, for example output:
apple!*
apple!
apple!
apple
On Thu, Feb 4, 2010 at 12:18 PM, Steven A Rowe sar...@syr.edu
Solr needs memory allocation for different operations, not for the
index size. It needs X amount of memory for a query, Y amount of
memory for document found by a query, and other things. Sorting needs
memory for the number of documents. Faceting needs memory for the
number of unique values in a
: My parameters look like this (running the Solr example):
:
: java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=6060
: -Dcom.sun.management.jmxremote.authenticate=false
: -Dcom.sun.management.jmxremote.ssl=false -jar start.jar
What implementation/version of java are you
I remember that I had to have a JMX password file with the right permissions,
or it wouldn't start. --wunder
On Feb 4, 2010, at 2:27 PM, Chris Hostetter wrote:
: My parameters look like this (running the Solr example):
:
: java -Dcom.sun.management.jmxremote
Robert, thanks for redoing all the Solr analyzers to the new API! It
helps to have many examples to work from, best practices so to speak.
Answering my own question... PatternReplaceFilter doesn't output
multiple tokens...
Which means messing with capture state...
On Thu, Feb 4, 2010 at 2:16 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Transferred partially to solr-user...
Steven, thanks for the reply!
I wonder if
We just switched over to storing our data directly in Solr as
compressed JSON fields at http://frugalmechanic.com. So far it's
working out great. Our detail pages (e.g.:
http://frugalmechanic.com/auto-part/817453-33-2084-kn-high-performance-air-filter)
now make a single Solr request to grab the
The Tika integration with the DataImportHandler allows you to control
many aspects of what goes into the index, including solving this
problem:
http://wiki.apache.org/solr/TikaEntityProcessor
(Tika is the extraction library, and ExtractingRequestHandler and the
TikaEntityProcessor both use it.)
: bq=(*:* -field_a:54^1)
I think what you want there is bq=(*:* -field_a:54)^1
...you are boosting things that don't match field_a:54
Thanks Hoss. I've updated the Wiki, the content of the bq param was wrong:
i want to recompile lucene with
http://issues.apache.org/jira/browse/LUCENE-2230, but im not sure
which source tree to use, i tried using the implied trunk revision
from the admin/system page but solr fails to build with the generated
jars, even if i exclude the patches from 2230...
im wondering
Is there a way to return Solr's analyzed/filtered tokens from a query,
rather than the original indexed data? (Ideally at a fairly high level like
solrj).
Thanks
Hi
I want to add a filter to my query which takes documents whose city
field has either Bangalore of cochin or Bombay. how do i do this?
fq=city:bangalorefq=city:bombay fq=city:cochin will take the
intersection. I need the union.
Please help
Thanks
Hi
I want to add a filter to my query which takes documents
whose city
field has either Bangalore of cochin or Bombay. how do i do
this?
fq=city:bangalorefq=city:bombay fq=city:cochin
will take the
intersection. I need the union.
fq=city:(bangalore OR cochin OR bombay)
same syntax as
62 matches
Mail list logo