Re: performance crossover between single index and sharding

2011-08-04 Thread Bernd Fehling
Hi Shawn, the 0.05 seconds for search time at peek times (3 qps) is my target for Solr. The numbers for solr are from Solr's statistic report page. So 39.5 seconds average per request is definately to long and I have to change to sharding. For FAST system the numbers for the search dispatcher

Re: segment.gen file is not replicated

2011-08-04 Thread Bernd Fehling
I have now updated to solr 3.3 but segment.gen is still not replicated. Any idea why, is it a bug or a feature? Should I write a jira issue for it? Regards Bernd Am 29.07.2011 14:10, schrieb Bernd Fehling: Dear list, is there a deeper logic behind why the segment.gen file is not replicated

Re: Update some fields for all documents: LUCENE-1879 vs. ParallelReader .FilterIndex

2011-08-04 Thread karsten-solr
Hi Erick, thanks a lot! This looks like a good idea: Our queries with the changeable fields fits the join-idea from https://issues.apache.org/jira/browse/SOLR-2272 because - we do not need relevance ranking - we can separate in a conjunction of a query with the changeable fields and our other

Re: segment.gen file is not replicated

2011-08-04 Thread Michael McCandless
This file is actually optional; its there for redundancy in case the filesystem is not reliable when listing a directory. Ie, normally, we list the directory to find the latest segments_N file; but if this is wrong (eg the file system might have stale a cache) then we fallback to reading the

Re: A rant about field collapsing

2011-08-04 Thread Martijn v Groningen
The development of the field collapse feature is a long and confusing story. The main point is that SOLR-236 was never going to scale and the performance in general was bad. A new approach was needed. This was implemented in SOLR-1682 and added to the trunk (4.0-dev) around September last year.

Re: Solr 3.3 crashes after ~18 hours?

2011-08-04 Thread alexander sulz
Thank you for the many replies! Like I said, I couldn't find anything in logs created by solr. I just had a look at the /var/logs/messages and there wasn't anything either. What I mean by crash is that the process is still there and http GET pings would return 200 but when i try visiting

Re: segment.gen file is not replicated

2011-08-04 Thread Bernd Fehling
Am 04.08.2011 12:52, schrieb Michael McCandless: This file is actually optional; its there for redundancy in case the filesystem is not reliable when listing a directory. Ie, normally, we list the directory to find the latest segments_N file; but if this is wrong (eg the file system might

Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)

2011-08-04 Thread thomas
Concerning the downtime, we found a solution that works well for us. We allready implemented an update mechanism so that when authors are changing some content in the cms, the index regarding this piece of content gets updated (delete than index again) as well. All we had to do is: 1. Change the

Unbuffered entity enclosing request can not be repeated Invalid chunk header

2011-08-04 Thread Vadim Kisselmann
Hello folks, i use solr 1.4.1 and every 2 to 6 hours i have indexing errors in my log files. on the client side: 2011-08-04 12:01:18,966 ERROR [Worker-242] IndexServiceImpl - Indexing failed with SolrServerException. Details: org.apache.commons.httpclient.ProtocolException: Unbuffered entity

Re: performance crossover between single index and sharding

2011-08-04 Thread Peter Keegan
We have 16 shards on 4 physical servers. Shard size was determined by measuring query response times as a function of doc count. Multiple shards per server provides parallelism. In a VM environment, I would lean towards 1 shard per VM (with 1/4 the RAM). We implemented our own distributed search

RE: performance crossover between single index and sharding

2011-08-04 Thread Bob Sandiford
Dumb question time - you are using a 64 bit Java, and not a 32 bit Java? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] Sent:

Re: performance crossover between single index and sharding

2011-08-04 Thread Bernd Fehling
java version 1.6.0_21 Java(TM) SE Runtime Environment (build 1.6.0_21-b06) Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode) java: file format elf64-x86-64 Including the -d64 switch. Am 04.08.2011 14:40, schrieb Bob Sandiford: Dumb question time - you are using a 64 bit Java,

Re: A rant about field collapsing

2011-08-04 Thread baronDodd
Ok thank you very much for clearing that up a little. I think another reason I was confused was that the wiki page for grouping was based around the original field collapsing plan at the time which led me to the jira and hence the patch files, rant over! Perhaps you can help to clarify if the

Re: A rant about field collapsing

2011-08-04 Thread Martijn v Groningen
Well, the original page moved to: http://wiki.apache.org/solr/FieldCollapsingUncommitted Assuming that you're using Solr 3.3 you can't get the grouped result (lst name=grouped) with SolrJ. I added grouping support to SolrJ some time ago and will be in Solr 3.4. You can use a nightly 3.x build to

Re: Solr 3.3 crashes after ~18 hours?

2011-08-04 Thread Yonik Seeley
On Thu, Aug 4, 2011 at 8:09 AM, alexander sulz a.s...@digiconcept.net wrote: Thank you for the many replies! Like I said, I couldn't find anything in logs created by solr. I just had a look at the /var/logs/messages and there wasn't anything either. What I mean by crash is that the process

RE: Strategies for sorting by array, when you can't sort by array?

2011-08-04 Thread Olson, Ron
For anyone who comes across this topic in the future, I solved the problem this way: by agreement with the stakeholders, on the presumption that no one would look at more than 5000 records, I modified my search code so that, if the user selected to sort by the name, I set the row count to

Re: performance crossover between single index and sharding

2011-08-04 Thread Shawn Heisey
On 8/4/2011 12:38 AM, Bernd Fehling wrote: Hi Shawn, the 0.05 seconds for search time at peek times (3 qps) is my target for Solr. The numbers for solr are from Solr's statistic report page. So 39.5 seconds average per request is definately to long and I have to change to sharding. Solr

Re: Solr 3.3 crashes after ~18 hours?

2011-08-04 Thread Manish Bafna
Check out Physcial memory/virtual memory usage. RAM usage might be less but Physical memory usage goes up as you index more documents. It might be because of MMapDirectory which used MappedByteBuffer. On Thu, Aug 4, 2011 at 7:38 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Thu, Aug 4,

RE: Joining on multi valued fields

2011-08-04 Thread matthew . fowler
Hi Yonik So I tested the join using the sample data below and the latest trunk. I still got the same behaviour. HOWEVER! In this case it was nothing to do with the patch or solr version. It was the tokeniser splitting G1 into G and 1. So thank you for a nice patch and your suggestions. I do

Indexing tweet and searching @keyword OR #keyword

2011-08-04 Thread Mohammad Shariq
I have indexed around 1 million tweets ( using text dataType). when I search the tweet with # OR @ I dont get the exact result. e.g. when I search for #ipad OR @ipad I get the result where ipad is mentioned skipping the # and @. please suggest me, how to tune or what are filterFactories to

Re: Joining on multi valued fields

2011-08-04 Thread Yonik Seeley
On Thu, Aug 4, 2011 at 11:21 AM, matthew.fow...@thomsonreuters.com wrote: Hi Yonik So I tested the join using the sample data below and the latest trunk. I still got the same behaviour. HOWEVER! In this case it was nothing to do with the patch or solr version. It was the tokeniser

Re: Indexing tweet and searching @keyword OR #keyword

2011-08-04 Thread Jonathan Rochkind
It's the WordDelimiterFactory in your filter chain that's removing the punctuation entirely from your index, I think. Read up on what the WordDelimiter filter does, and what it's settings are; decide how you want things to be tokenized in your index to get the behavior your want; either get

How can I create a good autosuggest list with phrases?

2011-08-04 Thread Shawn Heisey
I'm at the point in my Solr deployment where I want to start using it for autosuggest, but I've run into a snag. Because the fields that I want to use for autosuggest are tokenized, I can only get single terms out of it. I would like to have it find common phrases that are between two and

Re: How can I create a good autosuggest list with phrases?

2011-08-04 Thread Sethi, Parampreet
We handled similar requirement in our product kitchendaily.com by creating a list of Search terms which were frequently searched over a period of time and then building auto-suggestion index from this data. The constant updates of this will allow you to support a well formed auto-suggest feature.

Re: How can I create a good autosuggest list with phrases?

2011-08-04 Thread Shawn Heisey
On 8/4/2011 10:04 AM, Sethi, Parampreet wrote: We handled similar requirement in our product kitchendaily.com by creating a list of Search terms which were frequently searched over a period of time and then building auto-suggestion index from this data. The constant updates of this will allow

merge factor performance

2011-08-04 Thread Naveen Gupta
Hi, We are having a requirement where we are having almost 100,000 documents to be indexed (atleast 20 fields). These fields are not having length greater than 10 KB. Also we are running parallel search for the same index. We found that it is taking almost 3 min to index the entire documents.

Re: merge factor performance

2011-08-04 Thread Naveen Gupta
Sorry for 15k Docs, it is taking 3 mins. On Thu, Aug 4, 2011 at 10:07 PM, Naveen Gupta nkgiit...@gmail.com wrote: Hi, We are having a requirement where we are having almost 100,000 documents to be indexed (atleast 20 fields). These fields are not having length greater than 10 KB. Also we

Re: segment.gen file is not replicated

2011-08-04 Thread Michael McCandless
I think we should fix replication to copy it? Mike McCandless http://blog.mikemccandless.com On Thu, Aug 4, 2011 at 8:16 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Am 04.08.2011 12:52, schrieb Michael McCandless: This file is actually optional; its there for redundancy in case

Re: using distributed search with the suggest component

2011-08-04 Thread mdz-munich
Hi Tobias, sadly, it seems you are right. After a little bit investigation we also recognized that some names (we use it for auto-completing author-names), are missing. And since it is a distributed setup ... But I am almost sure it worked with Solr 3.2. Best regards, Sebastian --

Re: Solr 3.3 crashes after ~18 hours?

2011-08-04 Thread Stephen Duncan Jr
On Thu, Aug 4, 2011 at 10:08 AM, Yonik Seeley yo...@lucidimagination.com wrote: ignores means what?  The request hangs?  If so, could you get a thread dump? Do queries work (like /solr/select?q=*:*) ? thous throwing no errors, no 503's.. It's like the server has a blackout and stares

Re: MultiSearcher/ParallelSearcher - searching over multiple cores?

2011-08-04 Thread Ralf Musick
Hi Erik, I have several types with different properties, but they are supposed to be combined to one search. Imagine a book with property title and a journal with property name. (the types in my project have of course more complex properties.) So I created a new core with combined

Re: Is there anyway to sort differently for facet values?

2011-08-04 Thread Way Cool
Thanks Eric for your reply. I am aware of facet.sort, but I haven't used it. I will try that though. Can it handle the values below in the correct order? Under 10 10 - 20 20 - 30 Above 30 Or Small Medium Large XL ... My second question is that if Solr can't do that for the values above by using

What's the best way (practice) to do index distribution at this moment? Hadoop? rsyncd?

2011-08-04 Thread Way Cool
Hi, guys, What's the best way (practice) to do index distribution at this moment? Hadoop? or rsyncd (back to 3 years ago ;-)) ? Thanks, Yugang

Re: Is there anyway to sort differently for facet values?

2011-08-04 Thread Jonathan Rochkind
No, it can not. It just sorts alphabetically, actually by raw byte-order. No other facet sorting functionality is available, and it would be tricky to implement in a performant way because of the way lucene works. But it would certainly be useful to me too if someone could figure out a way

Re: lucene/solr, raw indexing/searching

2011-08-04 Thread dhastings
I have decided to use solr for indexing as well. the types of searches im doing are professional/academic. so for example, i need to match: all over the following exactly from my source data: 3.56, 4 harv. l. rev. 45, 187-532, 3 llm 56, 5 unts 8, 6 u.n.t.s. 78,

Re: What's the best way (practice) to do index distribution at this moment? Hadoop? rsyncd?

2011-08-04 Thread Jonathan Rochkind
I'm not sure what you mean by index distribution, that could possibly mean several things. But Solr has had a replication feature built into it from 1.4, that can probably handle the same use cases as rsync, but better. So that may be what you want. There are certainly other experiments

Re: lucene/solr, raw indexing/searching

2011-08-04 Thread Jonathan Rochkind
It depends. Okay, the source contains 4 harv. l. rev. 45 . Do you want a user entered harv. to ALSO match harv (without the period) in source, and vice versa? Or do you require it NOT match? Or do you not care? The default filter analysis chain will index 4 harv. l. rev. 45 essentially as

Re: Is there anyway to sort differently for facet values?

2011-08-04 Thread Sethi, Parampreet
It can be achieved by creating own (app specific) custom comparators for fields defined in schema.xml and having an extra attribute to specify the comparator class in the field tag itself. But it will require changes in the Solr to support this feature. (Not sure if it's feasible though just

Re: What's the best way (practice) to do index distribution at this moment? Hadoop? rsyncd?

2011-08-04 Thread Way Cool
Yes, I am talking about replication feature. I remember I tried rsync 3 years ago with solr 1.2. Just not sure if someone else have done anything better than that during the last 3 years. ;-) Personally I am thinking about using Hadoop and ZooKeeper. Has anyone tried those features? I found a

deleting index directory/files

2011-08-04 Thread Mark juszczec
Hello all I'm using multiple cores. I there's a directory named by the core and it contains a subdir named data that contains a subdir named index that contains a bunch of files that contain the data for my index. Let's say I want to completely rebuild the index from scratch. Can I delete the

RE: deleting index directory/files

2011-08-04 Thread Olson, Ron
I ran into a problem when I deleted just the index directory; I deleted the entire data directory and it was recreated on the next load. BTW, if you're using the DIH, its default behavior is to remove all records on a full import, so you can save yourself having to remove any actual files.

Json update using HttpURLConnection

2011-08-04 Thread Sharath Jagannath
I am trying to post the json update request using java.net.HttpURLConnection. Parameters I am using: url : http://localhost:8983/solr/update/json?commit=true String data = [{\id\ : \TestDoc7\, \title\ : \test 7\}, {\id\ : \TestDoc8\, \title\ : \another test 8\}]; uri += + data;

Re: Minimum Score

2011-08-04 Thread Kissue Kissue
Hi, I am using Solr 3.1 with the SolrJ client library. I can see that it is possible to get the maximum score for your search by using the following: response.getResults().getMaxScore() I am wondering is there some simple solution to get the minimum score? Many thanks.

SOLR Support for Span Queries

2011-08-04 Thread Joshua Harness
How does one issue span queries in SOLR (Span, SpanNear, etc)? I've done a bit of research and it seems that these are not supported. It would seem that I need to implement a QueryParserPlugin to accomplish this. Is this the correct path? Surely this has been done before. Does anybody have links

Re: Records skipped when using DataImportHandler

2011-08-04 Thread anand sridhar
Ok. After analysis, I narrowed the reduced results set to the fact that the zipcode field is not indexed 'as is'. i.e the zipcode field values are broken down into tokens and then stored. Hence, if there are 10 documents with zipcode fields varying from 91000-91009, then the zipcode fields are not

Re: Minimum Score

2011-08-04 Thread Darren Govoni
Off the top of my head you maybe you can get the number of results and then look at the last document and check its score. I believe the results will be ordered by score? On 08/04/2011 05:44 PM, Kissue Kissue wrote: Hi, I am using Solr 3.1 with the SolrJ client library. I can see that it is

Re: Json update using HttpURLConnection

2011-08-04 Thread Sharath Jagannath
Never mind, It was some stupid bug. Figured it out. Cheers, Sharath On Thu, Aug 4, 2011 at 2:35 PM, Sharath Jagannath shotsonclo...@gmail.comwrote: I am trying to post the json update request using java.net.HttpURLConnection. Parameters I am using: url :

Loading huge synonym list in Solr

2011-08-04 Thread Arun Atreya
Hello, I would like to know the best way to load a huge synonym list into Solr. I would like to do concept indexing (a.k.a category indexing) with Solr. For example, I want to be able to index all cities and be able to search for all of them using a special keyword, say 'CONCEPTcity', where

Re: Loading huge synonym list in Solr

2011-08-04 Thread Robert Muir
https://issues.apache.org/jira/browse/LUCENE-3233 On Thu, Aug 4, 2011 at 7:24 PM, Arun Atreya my.2.pai...@gmail.com wrote: Hello, I would like to know the best way to load a huge synonym list into Solr. I would like to do concept indexing (a.k.a category indexing) with Solr. For example, I

Copy Fields while Replication

2011-08-04 Thread Pawan Darira
Hi I would like to know whether i can add new fields while replicating index on Slave. E.g. My Master has index with field F1 which is created with type string. Now, i don't want F1 as a type string also have limitation that i cannot change the field type at schema level. Now, if i replicate

Solr DIH import - Date Question

2011-08-04 Thread solruser@9913
This is perhaps a 'truly newbie' question. I am processing some files via DIH handler/XPATH Processor. Some of the date fields in the XML are in 'Java Long format' i.e. just a big long number. I am wondering how to map them Solr Date field. I used the DIH DateFormatTransformer for some other

Re: Solr DIH import - Date Question

2011-08-04 Thread Lance Norskog
You might have to do this with an external script. The DIH lets you process fields with javascript or Groovy. Also, somewhere in the DIH you can give an XSL stylesheet instead of just an XPath. On Thu, Aug 4, 2011 at 10:31 PM, solruser@9913 gunaranj...@yahoo.com wrote: This is perhaps a 'truly