reduce the content???

2010-08-25 Thread satya swaroop
Hi all, i indexed nearly 100 java pdf files which are of large size(min 1MB). The solr is showing the results with the entire content that it indexed which is taking time to show the results.. cant we reduce the content it shows or can i just have the file names and ids instead of the entire

Re: 'Error 404: missing core name in path ' in adminconsole

2010-08-25 Thread Robert Naczinski
Thanx for your help. I bound de.lvm.services.logging.PerformanceLoggingFilter in web.xml and mapped it to /admin/* . It works fine with EmbeddedSolr. I get NullPointer in some links under admin/index.jsp, but I will solve this problem. Robert 2010/8/25 Chris Hostetter hossman_luc...@fucit.org:

Re: reduce the content???

2010-08-25 Thread Shalin Shekhar Mangar
On Wed, Aug 25, 2010 at 12:51 PM, satya swaroop sswaro...@gmail.com wrote: Hi all, i indexed nearly 100 java pdf files which are of large size(min 1MB). The solr is showing the results with the entire content that it indexed which is taking time to show the results.. cant we reduce the

Re: SolrJ addField with Reader

2010-08-25 Thread Shalin Shekhar Mangar
On Tue, Aug 24, 2010 at 10:37 AM, Bojan Vukojevic email...@gmail.comwrote: I am using SolrJ with embedded Solr server and some documents have a lot of text. Solr will be running on a small device with very limited memory. In my tests I cannot process more than 3MB of text (in a body) with

solrCloud zookeepr related excpetions

2010-08-25 Thread Yatir Ben Shlomo
Hi I am running a zookeeper ensemble of 3 zookeeper instances and established a solrCloud to work with it (2 masters , 2 slaves) on each master machine I have 2 shards (4 shards in total) on one of the masters I keep noticing ZooKeeper related exceptions which I can't understand: One appears to

Re: SolrException log

2010-08-25 Thread Tommaso Teofili
Hi again Bastian, 2010/8/23 Bastian Spitzer bspit...@magix.net I dont seem to find a decent documentation on how those parameters actually work. this is the default, example block: deletionPolicy class=solr.SolrDeletionPolicy !-- The number of commit points to be kept --

Solr search speed very low

2010-08-25 Thread Andrey Sapegin
Dear ladies and gentlemen. I'm newbie with Solr, I didn't find an aswer in wiki, so I'm writing here. I'm analysing Solr performance and have 1 problem. *Search time is about 7-10 seconds per query.* I have a *.csv 5Gb-database with about 15 fields and 1 key field (record number). I

Regd WSTX EOFException

2010-08-25 Thread Pooja Verlani
Hi, Sometimes while indexing to solr, I am getting the following exception. com.ctc.wstx.exc.WstxEOFException: Unexpected end of input block in end tag I think its some configuration issue. Kindly suggest. I have a solr working with Tomcat 6 Thanks Pooja

Re: Solr search speed very low

2010-08-25 Thread Marco Martinez
You should use the tokenizer solr.WhitespaceTokenizerFactory in your field type to get your terms indexed, once you have indexed the data, you dont need to use the * in your queries that is a heavy query to solr. Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26.

Slow facet sorting - lex vs count

2010-08-25 Thread Eric Grobler
Hi Solr experts, There is a huge difference doing facet sorting on lex vs count The strange thing is that count sorting is fast when setting a small limit. I realize I can do sorting in the client, but I am just curious why this is. FAST - 16ms facet.field=city f.city.facet.limit=5000

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Yonik Seeley
On Wed, Aug 25, 2010 at 7:22 AM, Eric Grobler impalah...@googlemail.com wrote: There is a huge difference doing facet sorting on lex vs count The strange thing is that count sorting is fast when setting a small limit. I realize I can do sorting in the client, but I am just curious why this is.

Re: Regd WSTX EOFException

2010-08-25 Thread Yonik Seeley
On Wed, Aug 25, 2010 at 6:41 AM, Pooja Verlani pooja.verl...@gmail.com wrote: Hi, Sometimes while indexing to solr, I am getting  the following exception. com.ctc.wstx.exc.WstxEOFException: Unexpected end of input block in end tag I think its some configuration issue. Kindly suggest. I have

Re: Restricting HTML search?

2010-08-25 Thread Ken Krugler
On Aug 24, 2010, at 10:55pm, Paul Libbrecht wrote: Wouldn't the usage of the NeckoHTML (as an XML-parser) and XPath be safer? I guess it all depends on the quality of the source document. If you're processing HTML then you definitely want to use something like NekoHTML or TagSoup. Note

Re: Solr search speed very low

2010-08-25 Thread Geert-Jan Brits
have a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters to see how that works. 2010/8/25 Marco Martinez mmarti...@paradigmatecnologico.com You should use the tokenizer solr.WhitespaceTokenizerFactory in your field type to get your terms indexed, once you have indexed the

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Eric Grobler
Hi Yonik, Thanks for your response. I use Solr 1.41 There are 14000 cities in the index. The type is just a simple string: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ The facet method is fc. You are right I do not need 5000 cities, I was just surprised to see

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Yonik Seeley
On Wed, Aug 25, 2010 at 10:07 AM, Eric Grobler impalah...@googlemail.com wrote: I use Solr 1.41 There are 14000 cities in the index. The type is just a simple string: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ The facet method is fc. You are right I do

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Eric Grobler
Hi Yonik, Thanks for the technical explanation. I will in general try to use lex and sort by count in the client if there are not too many rows. Have a nice day. Regards ericz On Wed, Aug 25, 2010 at 4:41 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Wed, Aug 25, 2010 at 10:07 AM,

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Yonik Seeley
On Wed, Aug 25, 2010 at 11:29 AM, Peter Spam ps...@mac.com wrote: So, I went through all the effort to break my documents into max 1 MB chunks, and searching for hello still takes over 40 seconds (searching across 7433 documents):        8 results (41980 ms) What is going on???  (scroll

Distinct values versus schema change?

2010-08-25 Thread Willie Whitehead
Hi, I'm having a problem where a Solr query on all items in one category is returning duplicated items when an item appears in more than one subcategory. My schema involves a document for each item's subcategory instance. I know this is not correct. I'm not sure if I ever tried multiple values

Re: How to delete documents from SOLR index using DIH

2010-08-25 Thread Erick Erickson
I'm not sure what you mean here. You can delete via query or unique id. But DIH really isn't relevant here. If you've defined a unique key, simply re-adding any changed documents will delete the old one and insert the new document. If this makes no sense, could you explain what the underlying

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Peter Spam
This is a very small number of documents (7000), so I am surprised Solr is having such a hard time with it!! I do facet on 3 terms. Subsequent hello searches are faster, but still well over a second. This is a very fast Mac Pro, with 6GB of RAM. Thanks, Peter On Aug 25, 2010, at 9:52 AM,

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Yonik Seeley
On Wed, Aug 25, 2010 at 10:55 AM, Eric Grobler impalah...@googlemail.com wrote: Thanks for the technical explanation. I will in general try to use lex and sort by count in the client if there are not too many rows. I just developed a patch that may help this scenario:

how to deal with virtual collection in solr?

2010-08-25 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Hello, I just started to investigate Solr several weeks ago. Our current project uses Verity search engine which is commercial product and the company is out of business. I am trying to evaluate if Solr can meet our requirements. I have following questions. 1. Currently we use Verity and have

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Yonik Seeley
On Wed, Aug 25, 2010 at 2:50 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Aug 25, 2010 at 10:55 AM, Eric Grobler impalah...@googlemail.com wrote: Thanks for the technical explanation. I will in general try to use lex and sort by count in the client if there are not too many

Increasing Logging of Delta Queries

2010-08-25 Thread Vladimir Sutskever
Hi All, Is there a way to increase the debugging level of SOLR delta query imports. I would like to see records that have been picked up by SOLR be spit out to Standard Output or a log file. Thank You! Kind regards, Vladimir Sutskever Investment Bank - Technology JPMorgan Chase, Inc.

Re: how to deal with virtual collection in solr?

2010-08-25 Thread Walter Underwood
On Aug 25, 2010, at 12:18 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: I just started to investigate Solr several weeks ago. Our current project uses Verity search engine which is commercial product and the company is out of business. Verity is not out of business. They were acquired by

RE: how to deal with virtual collection in solr?

2010-08-25 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thank you for letting me know. Does Autonomy still support Verity search engine? -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Wednesday, August 25, 2010 3:41 PM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr?

Re: Slow facet sorting - lex vs count

2010-08-25 Thread Yonik Seeley
On Wed, Aug 25, 2010 at 7:22 AM, Eric Grobler impalah...@googlemail.com wrote: Hi Solr experts, There is a huge difference doing facet sorting on lex vs count The strange thing is that count sorting is fast when setting a small limit. I realize I can do sorting in the client, but I am just

Re: how to deal with virtual collection in solr?

2010-08-25 Thread Jan Høydahl / Cominvent
1. Currently we use Verity and have more than 20 collections, each collection has a index for public items and a index for private items. So there are virtual collections which point to each collection and a virtual collection which points to all. For example, we have AA and BB collections.

Delete by query issue

2010-08-25 Thread Max Lynch
Hi, I am trying to delete all documents that have null values for a certain field. To that effect I can see all of the documents I want to delete by doing this query: -date_added_solr:[* TO *] This returns about 32,000 documents. However, when I try to put that into a curl call, no documents

Create a new index while Solr is running

2010-08-25 Thread mraible
We're starting to use Solr for our application. The data that we'll be indexing will change often and not accumulate over time. This means that we want to blow away our index and re-create it every hour or so. What's the easier way to do this while Solr is running and not give users a no data

Re: Create a new index while Solr is running

2010-08-25 Thread Ron Mayer
mraible wrote: We're starting to use Solr for our application. The data that we'll be indexing will change often and not accumulate over time. This means that we want to blow away our index and re-create it every hour or so. What's the easier way to do this while Solr is running and not give

Re: Create a new index while Solr is running

2010-08-25 Thread 朱炎詹
Take a look at Multicore feature, particular the SWAP, CREATE MERGE actions. Eric Pugh's Solr 1.4 Enterprise Search Server Book has good explanation. Scott - Original Message - From: mraible m...@raibledesigns.com To: solr-user@lucene.apache.org Sent: Thursday, August 26, 2010 6:31

Re: SolrJ addField with Reader

2010-08-25 Thread Lance Norskog
There are a couple of options here. Solr can fetch text from a file or from HTTP given an url. Look at the stream.file and stream.url parameters. You can use these from EmbeddedSolr. Also, there are 'ContentStream' objects in the SolrJ API which you can also use. Look at

Re: Restricting HTML search?

2010-08-25 Thread Lance Norskog
This assumes that the HTML is good quality. I don't know exactly what your use case is. If you're crawling the web you will find some very screwed-up HTML. On Wed, Aug 25, 2010 at 6:45 AM, Ken Krugler kkrugler_li...@transpac.com wrote: On Aug 24, 2010, at 10:55pm, Paul Libbrecht wrote:

Re: Regd WSTX EOFException

2010-08-25 Thread Lance Norskog
Does this happen when you are indexing with many threads at once? There are reports of sockets blocking and timing out in during multi-threaded indexing. On Wed, Aug 25, 2010 at 6:40 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Aug 25, 2010 at 6:41 AM, Pooja Verlani

Re: Distinct values versus schema change?

2010-08-25 Thread Lance Norskog
What you want is something called 'field collapsing'. This is a Solr implementation that (at a high level) gives you one of these documents and a report of how many more match the query. Collapsing multiple product styles/colors/sizes to one consumer-visible product is a common use case for this.

How to set custom fields for SolrSearchBean Query in Nutch?

2010-08-25 Thread Savannah Beckett
I am using SolrSearchBean inside my custom parse filter in Nutch 1.1.  My solr/nutch setup is working.  I have Nutch to crawl and index into Solr and I am able to search solr index with my solr admin page.  My solr schema is completely different than the one in Nutch.  When I tried to query

Re: Delete by query issue

2010-08-25 Thread 朱炎詹
Excuse me, what's the hyphen before the field name 'date_added_solr'? Is this some kind of new query format that I didn't know? deletequery-date_added_solr:[* TO *]/query/delete' - Original Message - From: Max Lynch ihas...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday,

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Lance Norskog
How much disk space is used by the index? If you run the Lucene CheckIndex program, how many terms etc. does it report? When you do the first facet query, how much does the memory in use grow? Are you storing the text fields, or only indexing? Do you fetch the facets only, or do you also fetch

Re: Restricting HTML search?

2010-08-25 Thread Ken Krugler
Actually TagSoup's reason for existence is to clean up all of the messy HTML that's out in the wild. Tika's HTML parser wraps this, and uses it to generate the stream of SAX events that it then consumes and turns into a normalized XHTML 1.0- compliant data stream. -- Ken On Aug 25, 2010,

Re: Increasing Logging of Delta Queries

2010-08-25 Thread Lance Norskog
There is a LogTransformer that logs data instead of adding to the document: http://www.lucidimagination.com/search/document/CDRG_ch06_6.4.7.3?q=logging transformer http://wiki.apache.org/solr/DataImportHandler#LogTransformer On Wed, Aug 25, 2010 at 12:35 PM, Vladimir Sutskever

Is there any strss test tool for testing Solr?

2010-08-25 Thread 朱炎詹
We're currently building a Solr index with ober 1.2 million documents. I want to do a good stress test of it. Does anyone know if ther's a appropriate stress test tool for Solr? Or any good suggestion? Best Regards, Scott

Re: Solr searching performance issues, using large documents (now 1MB documents)

2010-08-25 Thread Yonik Seeley
On Wed, Aug 25, 2010 at 2:34 PM, Peter Spam ps...@mac.com wrote: This is a very small number of documents (7000), so I am surprised Solr is having such a hard time with it!! I do facet on 3 terms. Subsequent hello searches are faster, but still well over a second.  This is a very fast Mac

Re: Delete by query issue

2010-08-25 Thread Max Lynch
I was trying to filter out all documents that HAVE that field. I was trying to delete any documents where that field had empty values. I just found a way to do it, but I did a range query on a string date in the Lucene DateTools format and it worked, so I'm satisfied. However, I believe it

Re: Restricting HTML search?

2010-08-25 Thread Lance Norskog
Cool! I did not know that Tika had a thoroughcareful HTML parser. On Wed, Aug 25, 2010 at 7:49 PM, Ken Krugler kkrugler_li...@transpac.com wrote: Actually TagSoup's reason for existence is to clean up all of the messy HTML that's out in the wild. Tika's HTML parser wraps this, and uses it to

Re: Is there any strss test tool for testing Solr?

2010-08-25 Thread Amit Nithian
i recommend JMeter. We use that to do load testing on a search server. of course you have to provide a reasonable set of queries as input... if you don't have any then a reasonable estimation based on your expected traffic should suffice. JMeter can be used for other load testing too.. Be careful

Re: Delete by query issue

2010-08-25 Thread Lance Norskog
Here's the problem: the standard Solr parser is a little weird about negative queries. The way to make this work is to say *:* AND -field:[* TO *] This means select everything AND only these documents without a value in the field. On Wed, Aug 25, 2010 at 7:55 PM, Max Lynch ihas...@gmail.com

Duplicating a Solr Doc

2010-08-25 Thread Max Lynch
Right now I am doing some processing on my Solr index using Lucene Java. Basically, I loop through the index in Java and do some extra processing of each document (processing that is too intensive to do during indexing). However, when I try to update the document in solr with new fields (using

Re: Delete by query issue

2010-08-25 Thread Max Lynch
Thanks Lance. I'll give that a try going forward. On Wed, Aug 25, 2010 at 9:59 PM, Lance Norskog goks...@gmail.com wrote: Here's the problem: the standard Solr parser is a little weird about negative queries. The way to make this work is to say *:* AND -field:[* TO *] This means select

Re: Duplicating a Solr Doc

2010-08-25 Thread Max Lynch
It seems like this is a way to accomplish what I was looking for: CoreContainer coreContainer = new CoreContainer(); File home = new File(/home/max/packages/test/apache-solr-1.4.1/example/solr); File f = new File(home, solr.xml);

JVM GC is very frequent.

2010-08-25 Thread Chengyang
We have about 500million documents are indexed.The index size is aobut 10G. Running on a 32bit box. During the pressure testing, we monitered that the JVM GC is very frequent, about 5min once. Is there any tips to turning this?