Hi all,
i indexed nearly 100 java pdf files which are of large size(min 1MB).
The solr is showing the results with the entire content that it indexed
which is taking time to show the results.. cant we reduce the content it
shows or can i just have the file names and ids instead of the entire
Thanx for your help.
I bound de.lvm.services.logging.PerformanceLoggingFilter in web.xml
and mapped it to /admin/* .
It works fine with EmbeddedSolr. I get NullPointer in some links under
admin/index.jsp, but I will solve this problem.
Robert
2010/8/25 Chris Hostetter hossman_luc...@fucit.org:
On Wed, Aug 25, 2010 at 12:51 PM, satya swaroop sswaro...@gmail.com wrote:
Hi all,
i indexed nearly 100 java pdf files which are of large size(min 1MB).
The solr is showing the results with the entire content that it indexed
which is taking time to show the results.. cant we reduce the
On Tue, Aug 24, 2010 at 10:37 AM, Bojan Vukojevic email...@gmail.comwrote:
I am using SolrJ with embedded Solr server and some documents have a lot
of
text. Solr will be running on a small device with very limited memory. In
my
tests I cannot process more than 3MB of text (in a body) with
Hi I am running a zookeeper ensemble of 3 zookeeper instances
and established a solrCloud to work with it (2 masters , 2 slaves)
on each master machine I have 2 shards (4 shards in total)
on one of the masters I keep noticing ZooKeeper related exceptions which I
can't understand:
One appears to
Hi again Bastian,
2010/8/23 Bastian Spitzer bspit...@magix.net
I dont seem to find a decent documentation on how those parameters
actually work.
this is the default, example block:
deletionPolicy class=solr.SolrDeletionPolicy
!-- The number of commit points to be kept --
Dear ladies and gentlemen.
I'm newbie with Solr, I didn't find an aswer in wiki, so I'm writing here.
I'm analysing Solr performance and have 1 problem. *Search time is about
7-10 seconds per query.*
I have a *.csv 5Gb-database with about 15 fields and 1 key field (record
number). I
Hi,
Sometimes while indexing to solr, I am getting the following exception.
com.ctc.wstx.exc.WstxEOFException: Unexpected end of input block in end tag
I think its some configuration issue. Kindly suggest.
I have a solr working with Tomcat 6
Thanks
Pooja
You should use the tokenizer solr.WhitespaceTokenizerFactory in your field
type to get your terms indexed, once you have indexed the data, you dont
need to use the * in your queries that is a heavy query to solr.
Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26.
Hi Solr experts,
There is a huge difference doing facet sorting on lex vs count
The strange thing is that count sorting is fast when setting a small limit.
I realize I can do sorting in the client, but I am just curious why this is.
FAST - 16ms
facet.field=city
f.city.facet.limit=5000
On Wed, Aug 25, 2010 at 7:22 AM, Eric Grobler impalah...@googlemail.com wrote:
There is a huge difference doing facet sorting on lex vs count
The strange thing is that count sorting is fast when setting a small limit.
I realize I can do sorting in the client, but I am just curious why this is.
On Wed, Aug 25, 2010 at 6:41 AM, Pooja Verlani pooja.verl...@gmail.com wrote:
Hi,
Sometimes while indexing to solr, I am getting the following exception.
com.ctc.wstx.exc.WstxEOFException: Unexpected end of input block in end tag
I think its some configuration issue. Kindly suggest.
I have
On Aug 24, 2010, at 10:55pm, Paul Libbrecht wrote:
Wouldn't the usage of the NeckoHTML (as an XML-parser) and XPath be
safer?
I guess it all depends on the quality of the source document.
If you're processing HTML then you definitely want to use something
like NekoHTML or TagSoup.
Note
have a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters to
see how that works.
2010/8/25 Marco Martinez mmarti...@paradigmatecnologico.com
You should use the tokenizer solr.WhitespaceTokenizerFactory in your field
type to get your terms indexed, once you have indexed the
Hi Yonik,
Thanks for your response.
I use Solr 1.41
There are 14000 cities in the index.
The type is just a simple string: fieldType name=string
class=solr.StrField sortMissingLast=true omitNorms=true/
The facet method is fc.
You are right I do not need 5000 cities, I was just surprised to see
On Wed, Aug 25, 2010 at 10:07 AM, Eric Grobler
impalah...@googlemail.com wrote:
I use Solr 1.41
There are 14000 cities in the index.
The type is just a simple string: fieldType name=string
class=solr.StrField sortMissingLast=true omitNorms=true/
The facet method is fc.
You are right I do
Hi Yonik,
Thanks for the technical explanation.
I will in general try to use lex and sort by count in the client if there
are not too many rows.
Have a nice day.
Regards
ericz
On Wed, Aug 25, 2010 at 4:41 PM, Yonik Seeley yo...@lucidimagination.comwrote:
On Wed, Aug 25, 2010 at 10:07 AM,
On Wed, Aug 25, 2010 at 11:29 AM, Peter Spam ps...@mac.com wrote:
So, I went through all the effort to break my documents into max 1 MB chunks,
and searching for hello still takes over 40 seconds (searching across 7433
documents):
8 results (41980 ms)
What is going on??? (scroll
Hi,
I'm having a problem where a Solr query on all items in one category
is returning duplicated items when an item appears in more than one
subcategory. My schema involves a document for each item's subcategory
instance. I know this is not correct.
I'm not sure if I ever tried multiple values
I'm not sure what you mean here. You can delete via query or unique id. But
DIH really isn't relevant here.
If you've defined a unique key, simply re-adding any changed documents will
delete the old one and insert the new document.
If this makes no sense, could you explain what the underlying
This is a very small number of documents (7000), so I am surprised Solr is
having such a hard time with it!!
I do facet on 3 terms.
Subsequent hello searches are faster, but still well over a second. This is
a very fast Mac Pro, with 6GB of RAM.
Thanks,
Peter
On Aug 25, 2010, at 9:52 AM,
On Wed, Aug 25, 2010 at 10:55 AM, Eric Grobler
impalah...@googlemail.com wrote:
Thanks for the technical explanation.
I will in general try to use lex and sort by count in the client if there
are not too many rows.
I just developed a patch that may help this scenario:
Hello,
I just started to investigate Solr several weeks ago. Our current project uses
Verity search engine which is commercial product and the company is out of
business. I am trying to evaluate if Solr can meet our requirements. I have
following questions.
1. Currently we use Verity and have
On Wed, Aug 25, 2010 at 2:50 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
On Wed, Aug 25, 2010 at 10:55 AM, Eric Grobler
impalah...@googlemail.com wrote:
Thanks for the technical explanation.
I will in general try to use lex and sort by count in the client if there
are not too many
Hi All,
Is there a way to increase the debugging level of SOLR delta query imports.
I would like to see records that have been picked up by SOLR be spit out to
Standard Output or a log file.
Thank You!
Kind regards,
Vladimir Sutskever
Investment Bank - Technology
JPMorgan Chase, Inc.
On Aug 25, 2010, at 12:18 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:
I just started to investigate Solr several weeks ago. Our current project
uses Verity search engine which is commercial product and the company is out
of business.
Verity is not out of business. They were acquired by
Thank you for letting me know. Does Autonomy still support Verity search
engine?
-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Wednesday, August 25, 2010 3:41 PM
To: solr-user@lucene.apache.org
Subject: Re: how to deal with virtual collection in solr?
On Wed, Aug 25, 2010 at 7:22 AM, Eric Grobler impalah...@googlemail.com wrote:
Hi Solr experts,
There is a huge difference doing facet sorting on lex vs count
The strange thing is that count sorting is fast when setting a small limit.
I realize I can do sorting in the client, but I am just
1. Currently we use Verity and have more than 20 collections, each collection
has a index for public items and a index for private items. So there are
virtual collections which point to each collection and a virtual collection
which points to all. For example, we have AA and BB collections.
Hi,
I am trying to delete all documents that have null values for a certain
field. To that effect I can see all of the documents I want to delete by
doing this query:
-date_added_solr:[* TO *]
This returns about 32,000 documents.
However, when I try to put that into a curl call, no documents
We're starting to use Solr for our application. The data that we'll be
indexing will change often and not accumulate over time. This means that we
want to blow away our index and re-create it every hour or so. What's the
easier way to do this while Solr is running and not give users a no data
mraible wrote:
We're starting to use Solr for our application. The data that we'll be
indexing will change often and not accumulate over time. This means that we
want to blow away our index and re-create it every hour or so. What's the
easier way to do this while Solr is running and not give
Take a look at Multicore feature, particular the SWAP, CREATE MERGE
actions.
Eric Pugh's Solr 1.4 Enterprise Search Server Book has good explanation.
Scott
- Original Message -
From: mraible m...@raibledesigns.com
To: solr-user@lucene.apache.org
Sent: Thursday, August 26, 2010 6:31
There are a couple of options here. Solr can fetch text from a file or
from HTTP given an url. Look at the stream.file and stream.url
parameters. You can use these from EmbeddedSolr.
Also, there are 'ContentStream' objects in the SolrJ API which you can
also use. Look at
This assumes that the HTML is good quality. I don't know exactly what
your use case is. If you're crawling the web you will find some very
screwed-up HTML.
On Wed, Aug 25, 2010 at 6:45 AM, Ken Krugler
kkrugler_li...@transpac.com wrote:
On Aug 24, 2010, at 10:55pm, Paul Libbrecht wrote:
Does this happen when you are indexing with many threads at once?
There are reports of sockets blocking and timing out in during
multi-threaded indexing.
On Wed, Aug 25, 2010 at 6:40 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
On Wed, Aug 25, 2010 at 6:41 AM, Pooja Verlani
What you want is something called 'field collapsing'. This is a Solr
implementation that (at a high level) gives you one of these documents
and a report of how many more match the query. Collapsing multiple
product styles/colors/sizes to one consumer-visible product is a
common use case for this.
I am using SolrSearchBean inside my custom parse filter in Nutch 1.1. My
solr/nutch setup is working. I have Nutch to crawl and index into Solr and I
am
able to search solr index with my solr admin page. My solr schema is
completely
different than the one in Nutch. When I tried to query
Excuse me, what's the hyphen before the field name 'date_added_solr'? Is this
some kind of new query format that I didn't know?
deletequery-date_added_solr:[* TO *]/query/delete'
- Original Message -
From: Max Lynch ihas...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thursday,
How much disk space is used by the index?
If you run the Lucene CheckIndex program, how many terms etc. does it report?
When you do the first facet query, how much does the memory in use grow?
Are you storing the text fields, or only indexing? Do you fetch the
facets only, or do you also fetch
Actually TagSoup's reason for existence is to clean up all of the
messy HTML that's out in the wild.
Tika's HTML parser wraps this, and uses it to generate the stream of
SAX events that it then consumes and turns into a normalized XHTML 1.0-
compliant data stream.
-- Ken
On Aug 25, 2010,
There is a LogTransformer that logs data instead of adding to the document:
http://www.lucidimagination.com/search/document/CDRG_ch06_6.4.7.3?q=logging
transformer
http://wiki.apache.org/solr/DataImportHandler#LogTransformer
On Wed, Aug 25, 2010 at 12:35 PM, Vladimir Sutskever
We're currently building a Solr index with ober 1.2 million documents. I
want to do a good stress test of it. Does anyone know if ther's a
appropriate stress test tool for Solr? Or any good suggestion?
Best Regards,
Scott
On Wed, Aug 25, 2010 at 2:34 PM, Peter Spam ps...@mac.com wrote:
This is a very small number of documents (7000), so I am surprised Solr is
having such a hard time with it!!
I do facet on 3 terms.
Subsequent hello searches are faster, but still well over a second. This
is a very fast Mac
I was trying to filter out all documents that HAVE that field. I was trying
to delete any documents where that field had empty values.
I just found a way to do it, but I did a range query on a string date in the
Lucene DateTools format and it worked, so I'm satisfied. However, I believe
it
Cool! I did not know that Tika had a thoroughcareful HTML parser.
On Wed, Aug 25, 2010 at 7:49 PM, Ken Krugler
kkrugler_li...@transpac.com wrote:
Actually TagSoup's reason for existence is to clean up all of the messy HTML
that's out in the wild.
Tika's HTML parser wraps this, and uses it to
i recommend JMeter. We use that to do load testing on a search server. of
course you have to provide a reasonable set of queries as input... if you
don't have any then a reasonable estimation based on your expected traffic
should suffice. JMeter can be used for other load testing too..
Be careful
Here's the problem: the standard Solr parser is a little weird about
negative queries. The way to make this work is to say
*:* AND -field:[* TO *]
This means select everything AND only these documents without a value
in the field.
On Wed, Aug 25, 2010 at 7:55 PM, Max Lynch ihas...@gmail.com
Right now I am doing some processing on my Solr index using Lucene Java.
Basically, I loop through the index in Java and do some extra processing of
each document (processing that is too intensive to do during indexing).
However, when I try to update the document in solr with new fields (using
Thanks Lance. I'll give that a try going forward.
On Wed, Aug 25, 2010 at 9:59 PM, Lance Norskog goks...@gmail.com wrote:
Here's the problem: the standard Solr parser is a little weird about
negative queries. The way to make this work is to say
*:* AND -field:[* TO *]
This means select
It seems like this is a way to accomplish what I was looking for:
CoreContainer coreContainer = new CoreContainer();
File home = new
File(/home/max/packages/test/apache-solr-1.4.1/example/solr);
File f = new File(home, solr.xml);
We have about 500million documents are indexed.The index size is aobut 10G.
Running on a 32bit box. During the pressure testing, we monitered that the JVM
GC is very frequent, about 5min once. Is there any tips to turning this?
52 matches
Mail list logo