cores vs indices

2011-08-08 Thread Daniel Schobel
Can someone provide me with a succinct defintion of what a solr core is? Is there a one-to-one relationship of cores to solr indices or can you have multiple indices per core? Cheers, Daniel

Re: cores vs indices

2011-08-08 Thread Dave Stuart
Hi Daniel, Yes there is a one-to-one relationship between Solr indices and cores. The one to many comes when you look at the relationship between cores and tomcat/jetty webapps instances. This gives you the ability to clone, add and swap cores around. See for for core manipulation functions:

Can Master push data to slave

2011-08-08 Thread Pawan Darira
Hi I am using Solr 1.4. and doing a replication process where my slave is pulling data from Master. I have 2 questions a. Can Master push data to slave b. How to make sure that lock file is not created while replication Please help thanks Pawan

string cut-off filter?

2011-08-08 Thread Bernd Fehling
Hi list, is there a string cut-off filter to limit the length of a KeywordTokenized string? So the string should not be dropped, only limitited to a certain length. Regards Bernd

Scoring using POJO/SolrJ

2011-08-08 Thread Kissue Kissue
Hi, I am using the SolrJ client library and using a POJO with the @Field annotation to index documents and to retrieve documents from the index. I retrieve the documents from the index like so: ListItem beans = response.getBeans(Item.class) Now in order to add the scores to the beans i added a

how to enable MMapDirectory in solr 1.4?

2011-08-08 Thread Li Li
hi all, I read Apache Solr 3.1 Released Note today and found that MMapDirectory is now the default implementation in 64 bit Systems. I am now using solr 1.4 with 64-bit jvm in Linux. how can I use MMapDirectory? will it improve performance?

Multiplexing TokenFilter for multi-language?

2011-08-08 Thread cnyee
Sorry if this has already been discussed, but I have already spent a couple of days googling in vain The problem: - documents in multiple languages (us, de, fr, es). - language is known (a team of editors determines the language manually, and users are asked to specify language option for

RE: how to enable MMapDirectory in solr 1.4?

2011-08-08 Thread Dyer, James
If you want to try MMapDirectory with Solr 1.4, then copy the class org.apache.solr.core.MMapDirectoryFactory from 3.x or Trunk, and either add it to the .war file (you can just add it under src/java and re-package the war), or you can put it in its own .jar file in the lib directory under

PositionIncrement gap and multi-valued fields.

2011-08-08 Thread Luis Cappa Banda
Hello! I have a doubt about the behaviour of searching over field types that have positionIncrementGap defined. For example, supose that: 1. We have a field called test defined as multi-valued and white space tokenized. 2. The index has an single document with a test value: str TEST1

Re: Weighted facet strings

2011-08-08 Thread Jonathan Rochkind
One kind of hacky way to accomplish some of those tasks involves creating a lot more Solr fields. (This kind of 'de-normalization' is often the answer to how to make Solr do something). So facet fields are ordinarily not tokenized or normalized at all. But that doesn't work very well for

Re: how to enable MMapDirectory in solr 1.4?

2011-08-08 Thread Rich Cariens
We patched our 1.4.1 build with SOLR-1969https://issues.apache.org/jira/browse/SOLR-1969(making MMapDirectory configurable) and realized a 64% search performance boost on our Linux hosts. On Mon, Aug 8, 2011 at 10:05 AM, Dyer, James james.d...@ingrambook.comwrote: If you want to try

solr-ruby: Error undefined method `closed?' for nil:NilClass

2011-08-08 Thread Ian Connor
Hi, I have seen some of these errors come through from time to time. It looks like: /usr/lib/ruby/1.8/net/http.rb:1060:in `request'\n/usr/lib/ruby/1.8/net/http.rb:845:in `post' /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:158:in `post'

Re: solr-ruby: Error undefined method `closed?' for nil:NilClass

2011-08-08 Thread Erik Hatcher
Ian - What does your solr-ruby using code look like? Solr::Connection is light-weight, so you could just construct a new one of those for each request. Are you keeping an instance around? Erik On Aug 8, 2011, at 12:03 , Ian Connor wrote: Hi, I have seen some of these errors

edismax configuration

2011-08-08 Thread Mark juszczec
Hello all Can someone direct me to a link with config info in order to allow use of the edismax QueryHandler? Mark

is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a sorted list without searching for any terms?

Test failures on lucene_solr_3_3 and branch_3x

2011-08-08 Thread Shawn Heisey
I've got a consistent test failure on Solr source code checked out from svn. The same thing happens with 3.3 and branch_3x. I have information saved from the failures on branch_3x, which I have gotten to to fail about a dozen times in a row. It fails on a test called

Re: is it possible to do a sort without query?

2011-08-08 Thread Alexei Martchenko
You can use the standard query parser and pass q=*:* 2011/8/8 Jason Toy jason...@gmail.com I am trying to list some data based on a function I run , specifically termfreq(post_text,'indie music') and I am unable to do it without passing in data to the q paramater. Is it possible to get a

bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
Aelexei, thank you , that does seem to work. My sort results seem to be totally wrong though, I'm not sure if its because of my sort function or something else. My query consists of: sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100 And I get back 4571232 hits. All the results don't

solr 3.1, not indexing entire document?

2011-08-08 Thread dhastings
hi, i have my solr field text configured as per earlier discussion: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter

Re: solr 3.1, not indexing entire document?

2011-08-08 Thread Markus Jelsma
Check your maxFieldLength settting. hi, i have my solr field text configured as per earlier discussion: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Yury Kats
On 8/8/2011 4:34 PM, Jason Toy wrote: Aelexei, thank you , that does seem to work. My sort results seem to be totally wrong though, I'm not sure if its because of my sort function or something else. My query consists of: sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100 And I

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Markus Jelsma
Aelexei, thank you , that does seem to work. My sort results seem to be totally wrong though, I'm not sure if its because of my sort function or something else. My query consists of: sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100 And I get back 4571232 hits. That's

Re: edismax configuration

2011-08-08 Thread Markus Jelsma
http://wiki.apache.org/solr/CommonQueryParameters#defType Hello all Can someone direct me to a link with config info in order to allow use of the edismax QueryHandler? Mark

Re: edismax configuration

2011-08-08 Thread Mark juszczec
Got it. Thank you. I thought this was going to be much more difficult than it actually was. Mark On Mon, Aug 8, 2011 at 4:50 PM, Markus Jelsma markus.jel...@openindex.iowrote: http://wiki.apache.org/solr/CommonQueryParameters#defType Hello all Can someone direct me to a link with

Re: PivotFaceting in solr 3.3

2011-08-08 Thread Erik Hatcher
As far as I know, there isn't a patch for pivot faceting for 3.x. It'd require extracting the code from trunk and porting it. Perhaps as easy as applying the diff from the pivot commit from trunk to the 3.x codebase? (but probably not quite that easy) Erik On Aug 3, 2011, at 00:58

Re: string cut-off filter?

2011-08-08 Thread karsten-solr
Hi Bernd, I also searched for such a filter but did not found it. Best regards Karsten P.S. I am using now this filter: public class CutMaxLengthFilter extends TokenFilter { public CutMaxLengthFilter(TokenStream in) { this(in, DEFAULT_MAXLENGTH); }

Re: Dispatching a query to multiple different cores

2011-08-08 Thread Erik Hatcher
You could use Solr's distributed (shards parameter) capability to do this. However, if you've got somewhat different schemas that isn't necessarily going to work properly. Perhaps unify your schemas in order to facilitate this using Solr's distributed search feature? Erik On Aug 3,

Re: solr 3.1, not indexing entire document?

2011-08-08 Thread dhastings
that was it... thanks. obviously the document is well over 2 mgs. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-3-1-not-indexing-entire-document-tp3236719p3236773.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: string cut-off filter?

2011-08-08 Thread Markus Jelsma
There is none indeed exept using copyField and maxChars. Could you perhaps come up with some regex that replaces the group of chars beyond the desired limit and replace it with '' ? That would fit in a pattern replace char filter. Hi Bernd, I also searched for such a filter but did not

Can Solr with the StatsComponent analyze 20+ million files?

2011-08-08 Thread Fred Smith
Hi, Currently we are in the process of figuring out how to deal with millions of CSV files containing weather data(20+ million files). Each file is about 500 bytes in size. We want to calculate statistics on fields read from the file. For example, the standard deviation of wind speed across all

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
Are not Dismax queries able to search for phrases using the default index(which is what I am using?) If I can already do phrase searches, I don't understand why I would need to reindex t be able to access phrases from a function. On Mon, Aug 8, 2011 at 1:49 PM, Markus Jelsma

Example Solr Config on EC2

2011-08-08 Thread Matt Shields
I'm looking for some examples of how to setup Solr on EC2. The configuration I'm looking for would have multiple nodes for redundancy. I've tested in-house with a single master and slave with replication running in Tomcat on Windows Server 2003, but even if I have multiple slaves the single

Re: Can Solr with the StatsComponent analyze 20+ million files?

2011-08-08 Thread Walter Underwood
This does not seem well matched to Solr. Solr and Lucene are optimized to show the best few matches, not every match. I'd use Hadoop for this. Or MarkLogic, if you'd like to talk about that off-list. wunder Lead Engineer, MarkLogic On Aug 8, 2011, at 1:59 PM, Fred Smith wrote: Hi,

Re: Dispatching a query to multiple different cores

2011-08-08 Thread Jonathan Rochkind
However, if you unify your schemas to do this, I'd consider whether you really want seperate cores/shards in the first place. If you want to search over all of them together, what are your reasons to put them in seperate solr indexes in the first place? Ordinarily, if you want to search over

Re: Can Solr with the StatsComponent analyze 20+ million files?

2011-08-08 Thread Markus Jelsma
Hi, Currently we are in the process of figuring out how to deal with millions of CSV files containing weather data(20+ million files). Each file is about 500 bytes in size. We want to calculate statistics on fields read from the file. For example, the standard deviation of wind speed across

Re: Example Solr Config on EC2

2011-08-08 Thread Yury Kats
On 8/8/2011 5:03 PM, Matt Shields wrote: I'm looking for some examples of how to setup Solr on EC2. The configuration I'm looking for would have multiple nodes for redundancy. I've tested in-house with a single master and slave with replication running in Tomcat on Windows Server 2003, but

Re: csv responsewriter and numfound

2011-08-08 Thread Erik Hatcher
Great question. But how would that get returned in the response? It is a drag that the header is lost when results are written in CSV, but there really isn't an obvious spot for that information to be returned. Erik On Aug 4, 2011, at 01:52 , Pooja Verlani wrote: Hi, Is there

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jonathan Rochkind
Dismax queries can. But sort=termfreq(all_lists_text,'indie+music') is not using dismax. Apparenty termfreq function can not? I am not familiar with the termfreq function. To understand why you'd need to reindex, you might want to read up on how lucene actually works, to get a basic

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Markus Jelsma
Are not Dismax queries able to search for phrases using the default index(which is what I am using?) If I can already do phrase searches, I don't understand why I would need to reindex t be able to access phrases from a function. Executing a Lucene phrase query is not the same as term

Re: csv responsewriter and numfound

2011-08-08 Thread Yonik Seeley
On Mon, Aug 8, 2011 at 5:12 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Great question.  But how would that get returned in the response? It is a drag that the header is lost when results are written in CSV, but there really isn't an obvious spot for that information to be returned. I

Re: Can Solr with the StatsComponent analyze 20+ million files?

2011-08-08 Thread Jonathan Rochkind
On 8/8/2011 5:10 PM, Markus Jelsma wrote: Will the StatsComponent in Solr do what we need with minimal configuration? Can the StatsComponent only be used on a subset of the data? For example, only look at data from certain months? If i remember correctly, it cannot. Well, if you index things

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Markus Jelsma
Dismax queries can. But sort=termfreq(all_lists_text,'indie+music') is not using dismax. Apparenty termfreq function can not? I am not familiar with the termfreq function. It simply returns the TF of the given _term_ as it is indexed of the current document. Sorting on TF like this

Re: Can Master push data to slave

2011-08-08 Thread Markus Jelsma
Hi, Hi I am using Solr 1.4. and doing a replication process where my slave is pulling data from Master. I have 2 questions a. Can Master push data to slave Not in current versions. Not sure about exotic patches for this. b. How to make sure that lock file is not created while

Re: Example Solr Config on EC2

2011-08-08 Thread mbohlig
Matthew, Here's another resource: http://www.lucidimagination.com/blog/2010/02/01/solr-shines-through-the-cloud-lucidworks-solr-on-ec2/ Michael Bohlig Lucid Imagination - Original Message From: Matt Shields m...@mattshields.org To: solr-user@lucene.apache.org Sent: Mon, August 8,

Re: Can Solr with the StatsComponent analyze 20+ million files?

2011-08-08 Thread Fred Smith
Thank you Walter, Markus and Jonathan for your fast responses and help! We will be looking into CouchDB (and Hadoop if necessary) to process our data. Thanks again, Fred

Re: Is anobdy using lotsofcores feature in production?

2011-08-08 Thread Uomesh
Hi Shalin, Is this means if I apply the patch mention at below link still Solr does not support lots of core? https://issues.apache.org/jira/browse/SOLR-1293 Are you saying this is just a concept and the patch is not an implementation? We are planning to use lots of core in our commerce system

Re: Can Master push data to slave

2011-08-08 Thread simon
You could configure a PostCommit event listener on the master which would send a HTTP fetchindex request to the slave you want to carry out replication - see http://wiki.apache.org/solr/SolrReplication#HTTP_API But why do you want the master to push to the slave ? -Simon On Mon, Aug 8, 2011 at

Re: Is anobdy using lotsofcores feature in production?

2011-08-08 Thread Uomesh
Hi Shalin, Is this means if I apply the patch mention at below link still Solr does not support lots of core? https://issues.apache.org/jira/browse/SOLR-1293 Are you saying this is just a concept and the patch is not an implementation? We are planning to use lots of core in our commerce system

Re: Same id on two shards

2011-08-08 Thread simon
Only one should be returned, but it's non-deterministic. See http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations -Simon On Sat, Aug 6, 2011 at 6:27 AM, Pooja Verlani pooja.verl...@gmail.com wrote: Hi, We have a multicore solr with 6 cores. We merge the results

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
I am trying to test out and compare different sorts and scoring. When I use dismax to search for indie music with: qf=all_lists_textq=indie+musicdefType=dismaxrows=100 I see some stuff that seems irrelevant, meaning in top results I see only 1 or 2 mentions of indie music, but when I look

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Markus Jelsma
If your want to understand and debug the scoring you can use debugQuery=true to see how different documents score. Most of the time docs with both terms are on top of the result set unless norms are interferring. To understand your should check the Solr relevancy wiki but the Lucene docs are

Re: Same id on two shards

2011-08-08 Thread Shawn Heisey
On 8/8/2011 4:07 PM, simon wrote: Only one should be returned, but it's non-deterministic. See http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations I had heard it was based on which one responded first. This is part of why we have a small index that contains the

Re: merge factor performance

2011-08-08 Thread Erick Erickson
What version of Solr are you using? And how are you sending your docs to Solr? Bumping your JVM size and bumping your RAM size to 128M also might help.. How are you sending your docs to Solr? And where are you getting them from? Are you sure that Solr is your problem or is it your data

Re: MultiSearcher/ParallelSearcher - searching over multiple cores?

2011-08-08 Thread Erick Erickson
I think you'll have to make this go yourself, I don't see how to make Solr do it for you. And even if it could, the scores aren't comparable, so combining them for presentation to the user will be interesting Best Erick On Thu, Aug 4, 2011 at 2:27 PM, Ralf Musick ra...@gmx.de wrote: Hi Erik,

Re: Records skipped when using DataImportHandler

2011-08-08 Thread Erick Erickson
Spend some time in the admin/analysis page, that'll show you what part of the analysis chain is doing what to your data. It'll save you a world of headache... But at a guess WordDelimiterFilterFactory is your culprit... Best Erick On Thu, Aug 4, 2011 at 6:08 PM, anand sridhar

Re: Same id on two shards

2011-08-08 Thread simon
I think the first one to respond is indeed the way it works, but that's only deterministic up to a point (if your small index is in the throes of a commit and everything required for a response happens to be cached on the larger shard ... who knows ?) On Mon, Aug 8, 2011 at 7:10 PM, Shawn Heisey

Re: Suggestions for copying fields across cores...

2011-08-08 Thread Erick Erickson
Not that I know of. Separate cores are pretty distinct to Solr, so you're probably stuck with doing it by sending the request to each core... Best Erick On Fri, Aug 5, 2011 at 5:51 PM, josh lucas j...@lucasjosh.com wrote: Is there a suggested way to copy data in fields to additional fields that

Re: how to enable MMapDirectory in solr 1.4?

2011-08-08 Thread Li Li
thank you. I will try it. On Mon, Aug 8, 2011 at 11:18 PM, Rich Cariens richcari...@gmail.com wrote: We patched our 1.4.1 build with SOLR-1969https://issues.apache.org/jira/browse/SOLR-1969(making MMapDirectory configurable) and realized a 64% search performance boost on our Linux hosts. On