Re: solr 3.5 taking long to index

2012-04-12 Thread Bernd Fehling
There were some changes in solrconfig.xml between solr3.1 and solr3.5. Always read CHANGES.txt when switching to a new version. Also helpful is comparing both versions of solrconfig.xml from the examples. Are you sure you need a MaxPermSize of 5g? Use jvisualvm to see what you really need. This

Re: Multi-words synonyms matching

2012-04-12 Thread elisabeth benoit
oh, that's right. thanks a lot, Elisabeth 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com Elisabeth - As you described, below mapping might suit for your need. mairie = hotel de ville, mairie mairie gets expanded to hotel de ville and mairie at index time. So mairie and hotel de

Re: Solr 3.5 takes very long to commit gradually

2012-04-12 Thread Jan Høydahl
What operating system? Are you using spellchecker with buildOnCommit? Anything special in your Update Chain? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 12. apr. 2012, at 06:45, Rohit wrote: We recently migrated from

Re: Solr 3.5 takes very long to commit gradually

2012-04-12 Thread Tirthankar Chatterjee
Hi Rohit, What would be the average size of your documents and also can you please share your idea of having 2 cores in the master. I just wanted to know the reasoning behind the design. Thanks in advance Tirthankar On Apr 12, 2012, at 3:19 AM, Jan Høydahl wrote: What operating system?

Re: Solr 3.5 takes very long to commit gradually

2012-04-12 Thread Tirthankar Chatterjee
Hi Rohit, Can you please check the solrconfig.xml in 3.5 and compare it with 3.1 if there are any warming queries specified while opening the searchers after a commit. Thanks, Tirthankar On Apr 12, 2012, at 3:30 AM, Tirthankar Chatterjee wrote: Hi Rohit, What would be the average size of

RE: Solr 3.5 takes very long to commit gradually

2012-04-12 Thread Rohit
Hi Tirthankar, The average size of documents would be a few Kb's this is mostly tweets which are being saved. The two cores are storing different kind of data and nothing else. Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Tirthankar

RE: Solr 3.5 takes very long to commit gradually

2012-04-12 Thread Rohit
Operating system in linux ubuntu. No not using spellchecker Only language detection in my update chain. Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Jan Høydahl [mailto:jan@cominvent.com] Sent: 12 April 2012 12:50 To:

Re: Solr 3.5 takes very long to commit gradually

2012-04-12 Thread Tirthankar Chatterjee
thanks Rohit.. for the information. On Apr 12, 2012, at 4:08 AM, Rohit wrote: Hi Tirthankar, The average size of documents would be a few Kb's this is mostly tweets which are being saved. The two cores are storing different kind of data and nothing else. Regards, Rohit Mobile:

Problem to integrate Solr in Jetty (the first example in the Apache Solr 3.1 Cookbook)

2012-04-12 Thread Bastian Hepp
Hi, I'm using Apache Solr 3.5.0 and Jetty 8.1.2 with Windows 7. (Versions in the Book used... Solr 3.1, Jetty 6.1.26) I've tried to get Solr running with Jetty. - I copied the jetty.xml and the webdefault.xml from the example Solr. - I copied the solr.war to webapps - I copied the solr directory

Re: Facets involving multiple fields

2012-04-12 Thread Marc SCHNEIDER
Hi, Thanks for your answer. Let's say I have to fields : 'keywords' and 'short_title'. For these fields I'd like to make a faceted search : if 'Computer' is stored in at least one of these fields for a document I'd like to get it added in my results. doc1 = keywords : 'Computer' / short_title :

Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
Given an input of Windjacke (probably wind jacket in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a Jacke (jacket) so that a query for Jacke would include the Windjacke document in its result set. It appears to me that such an

Re: Large Index and OutOfMemoryError: Map failed

2012-04-12 Thread Michael McCandless
Your largest index has 66 segments (690 files) ... biggish but not insane. With 64K maps you should be able to have ~47 searchers open on each core. Enabling compound file format (not the opposite!) will mean fewer maps ... ie should improve this situation. I don't understand why Solr defaults

codecs for sorted indexes

2012-04-12 Thread Carlos Gonzalez-Cadenas
Hello, We're using a sorted index in order to implement early termination efficiently over an index of hundreds of millions of documents. As of now, we're using the default codecs coming with Lucene 4, but we believe that due to the fact that the docids are sorted, we should be able to do much

AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
Given an input of Windjacke (probably wind jacket in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a Jacke (jacket) so that a query for Jacke would include the Windjacke document in its result set. It appears to me that such an

Re: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht
Michael, I'm on this list and the lucene list since several years and have not found this yet. It's been one neglected topics to my taste. There is a CompoundAnalyzer but it requires the compounds to be dictionary based, as you indicate. I am convinced there's a way to build the

Re: Lexical analysis tools for German language data

2012-04-12 Thread Bernd Fehling
You might have a look at: http://www.basistech.com/lucene/ Am 12.04.2012 11:52, schrieb Michael Ludwig: Given an input of Windjacke (probably wind jacket in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a Jacke (jacket) so that a

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-12 Thread pcrao
Hi Mikhail Khludnev, Thank you for the reply. I think the index is getting corrupted because StreamingUpdateSolrServer is keeping reference to some index files that are being deleted by EmbeddedSolrServer during commit/optimize process. As a result when I Index(Full) using EmbeddedSolrServer and

Re: Lexical analysis tools for German language data

2012-04-12 Thread Valeriy Felberg
If you want that query jacke matches a document containing the word windjacke or kinderjacke, you could use a custom update processor. This processor could search the indexed text for words matching the pattern .*jacke and inject the word jacke into an additional field which you can search

Re: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht
Bernd, can you please say a little more? I think this list is ok to contain some description for commercial solutions that satisfy a request formulated on list. Is there any product at BASIS Tech that provides a compound-analyzer with a big dictionary of decomposed compounds in German? If yes,

Solr Scoring

2012-04-12 Thread Kissue Kissue
Hi, I have a field in my index called itemDesc which i am applying EnglishMinimalStemFilterFactory to. So if i index a value to this field containing Edges, the EnglishMinimalStemFilterFactory applies stemming and Edges becomes Edge. Now when i search for Edges, documents with Edge score better

two structures in solr

2012-04-12 Thread tkoomzaaskz
Hi all, I'm a solr newbie, so sorry if I do anything wrong ;) I want to use SOLR not only for fast text search, but mainly to create a very fast search engine for a high-traffic system (MySQL would not do the job if the db grows too big). I need to store *two big structures* in SOLR: projects

Re: Question about solr.WordDelimiterFilterFactory

2012-04-12 Thread Erick Erickson
WordDelimiterFilterFactory will _almost_ do what you want by setting things like catenateWords=0 and catenateNumbers=1, _except_ that the punctuation will be removed. So 12.34 - 1234 ab,cd - ab cd is that close enough? Otherwise, writing a simple Filter is probably the way to go. Best Erick On

Dismax request handler differences Between Solr Version 3.5 and 1.4

2012-04-12 Thread mechravi25
Hi, We are currently using solr (version 1.4.0.2010.01.13.08.09.44). we have a strange situation in dismax request handler. when we search for a keyword and append qt=dismax, we are not getting the any results. The solr request is as follows:

Re: Facets involving multiple fields

2012-04-12 Thread Erick Erickson
facet.query=keywords:computer short_title:computer seems like what you're asking for. On Thu, Apr 12, 2012 at 3:19 AM, Marc SCHNEIDER marc.schneide...@gmail.com wrote: Hi, Thanks for your answer. Let's say I have to fields : 'keywords' and 'short_title'. For these fields I'd like to make a

Re: Lexical analysis tools for German language data

2012-04-12 Thread Bernd Fehling
Paul, nearly two years ago I requested an evaluation license and tested BASIS Tech Rosette for Lucene Solr. Was working excellent but the price much much to high. Yes, they also have compound analysis for several languages including German. Just configure your pipeline in solr and setup the

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-12 Thread Darren Govoni
You could use SolrCloud (for the automatic scaling) and just mount a fuse[1] HDFS directory and configure solr to use that directory for its data. [1] https://ccp.cloudera.com/display/CDHDOC/Mountable+HDFS On Thu, 2012-04-12 at 16:04 +0300, Ali S Kureishy wrote: Hi, I'm trying to setup a

is there a downside to combining search fields with copyfield?

2012-04-12 Thread geeky2
hello everyone, can people give me their thoughts on this. currently, my schema has individual fields to search on. are there advantages or disadvantages to taking several of the individual search fields and combining them in to a single search field? would this affect search times, term

AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
Von: Valeriy Felberg If you want that query jacke matches a document containing the word windjacke or kinderjacke, you could use a custom update processor. This processor could search the indexed text for words matching the pattern .*jacke and inject the word jacke into an additional field

Re: Lexical analysis tools for German language data

2012-04-12 Thread Markus Jelsma
Hi, We've done a lot of tests with the HyphenationCompoundWordTokenFilter using a from TeX generated FOP XML file for the Dutch language and have seen decent results. A bonus was that now some tokens can be stemmed properly because not all compounds are listed in the dictionary for the

Further questions about behavior in ReversedWildcardFilterFactory

2012-04-12 Thread neosky
I ask the question in http://lucene.472066.n3.nabble.com/A-little-onfusion-with-maxPosAsterisk-tt3889226.html However, when I do some implementation, I get a further questions. 1. Suppose I don't use ReversedWildcardFilterFactory in the index time, it seems that Solr doesn't allow the leading

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-12 Thread Ali S Kureishy
Thanks Darren. Actually, I would like the system to be homogenous - i.e., use Hadoop based tools that already provide all the necessary scaling for the lucene index (in terms of throughput, latency of writes/reads etc). Since SolrCloud adds its own layer of sharding/replication that is outside

AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
Von: Markus Jelsma We've done a lot of tests with the HyphenationCompoundWordTokenFilter using a from TeX generated FOP XML file for the Dutch language and have seen decent results. A bonus was that now some tokens can be stemmed properly because not all compounds are listed in the

RE: Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-12 Thread Darren Govoni
Solrcloud or any other tech specific replication isnt going to 'just work' with hadoop replication. But with some significant custom coding anything should be possible. Interesting idea. brbrbr--- Original Message --- On 4/12/2012 09:21 AM Ali S Kureishy wrote:brThanks Darren. br

Re: Question about solr.WordDelimiterFilterFactory

2012-04-12 Thread Jian Xu
Erick, Thank you for your response!  The problem with this approach is that searching for 12:34 will also match 12.34 which is not what I want. From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org; Jian Xu joseph...@yahoo.com Sent:

RE: SOLR 3.3 DIH and Java 1.6

2012-04-12 Thread randolf.julian
Thanks guys for all the help. We moved to an upgraded O.S. version and the java script worked. - Randolf -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-3-3-DIH-and-Java-1-6-tp3841355p3905583.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr 3.4 with nTiers = 2: usage of ids param causes NullPointerException (NPE)

2012-04-12 Thread Dmitry Kan
Can anyone help me out with this? Is this too complicated / unclear? I could share more detail if needed. On Wed, Apr 11, 2012 at 3:16 PM, Dmitry Kan dmitry@gmail.com wrote: Hello, Hopefully this question is not too complex to handle, but I'm currently stuck with it. We have a system

Re: Error

2012-04-12 Thread Erick Erickson
Please review: http://wiki.apache.org/solr/UsingMailingLists You haven't said whether, for instance, you're using trunk which is the only version that supports the termfreq function. Best Erick On Thu, Apr 12, 2012 at 4:08 AM, Abhishek tiwari abhishek.tiwari@gmail.com wrote:

Import null values from XML file

2012-04-12 Thread randolf.julian
We import an XML file directly to SOLR using a the script called post.sh in the exampledocs. This is the script: FILES=$* URL=http://localhost:8983/solr/update for f in $FILES; do echo Posting file $f to $URL curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8' echo done

Re: Lexical analysis tools for German language data

2012-04-12 Thread Walter Underwood
German noun decompounding is a little more complicated than it might seem. There can be transformations or inflections, like the s in Weinachtsbaum (Weinachten/Baum). Internal nouns should be recapitalized, like Baum above. Some compounds probably should not be decompounded, like Fahrrad

[Solr 4.0] Is it possible to do soft commit from code and not configuration only

2012-04-12 Thread Lyuba Romanchuk
Hi, I need to configure the solr so that the opened searcher will see a new document immidiately after it was adding to the index. And I don't want to perform commit each time a new document is added. I tried to configure maxDocs=1 under autoSoftCommit in solrconfig.xml but it didn't help.

AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
Von: Walter Underwood German noun decompounding is a little more complicated than it might seem. There can be transformations or inflections, like the s in Weinachtsbaum (Weinachten/Baum). I remember from my linguistics studies that the terminus technicus for these is Fugenmorphem

Re: [Solr 4.0] Is it possible to do soft commit from code and not configuration only

2012-04-12 Thread Mark Miller
On Apr 12, 2012, at 11:28 AM, Lyuba Romanchuk wrote: Hi, I need to configure the solr so that the opened searcher will see a new document immidiately after it was adding to the index. And I don't want to perform commit each time a new document is added. I tried to configure

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Mark Miller
Please see the documentation: http://wiki.apache.org/solr/SolrCloud#Required_Config schema.xml You must have a _version_ field defined: field name=_version_ type=long indexed=true stored=true/ On Apr 11, 2012, at 9:10 AM, Benson Margulies wrote: I didn't have a _version_ field, since

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht
Le 12 avr. 2012 à 17:46, Michael Ludwig a écrit : Some compounds probably should not be decompounded, like Fahrrad (farhren/Rad). With a dictionary-based stemmer, you might decide to avoid decompounding for words in the dictionary. Good point. More or less, Fahrrad is generally abbreviated

Re: Problem to integrate Solr in Jetty (the first example in the Apache Solr 3.1 Cookbook)

2012-04-12 Thread Shawn Heisey
On 4/12/2012 2:21 AM, Bastian Hepp wrote: When I try to start I get this error message: C:\\jetty-solrjava -jar start.jar java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at

Re: Large Index and OutOfMemoryError: Map failed

2012-04-12 Thread Mark Miller
On Apr 12, 2012, at 6:07 AM, Michael McCandless wrote: Your largest index has 66 segments (690 files) ... biggish but not insane. With 64K maps you should be able to have ~47 searchers open on each core. Enabling compound file format (not the opposite!) will mean fewer maps ... ie should

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Benson Margulies
On Thu, Apr 12, 2012 at 11:56 AM, Mark Miller markrmil...@gmail.com wrote: Please see the documentation: http://wiki.apache.org/solr/SolrCloud#Required_Config Did I fail to find this in google or did I just goad you into a writing job? I'm inclined to write a JIRA asking for _version_ to be

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Walter Underwood
On Apr 12, 2012, at 8:46 AM, Michael Ludwig wrote: I remember from my linguistics studies that the terminus technicus for these is Fugenmorphem (interstitial or joint morpheme). That is some excellent linguistic jargon. I'll file that with hapax legomenon. If you don't highlight, you can get

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Markus Jelsma
On Thursday 12 April 2012 18:00:14 Paul Libbrecht wrote: Le 12 avr. 2012 à 17:46, Michael Ludwig a écrit : Some compounds probably should not be decompounded, like Fahrrad (farhren/Rad). With a dictionary-based stemmer, you might decide to avoid decompounding for words in the dictionary.

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Walter Underwood
On Apr 12, 2012, at 9:00 AM, Paul Libbrecht wrote: More or less, Fahrrad is generally abbreviated as Rad. (even though Rad can mean wheel and bike) A synonym could handle this, since farhren would not be a good match. It is judgement call, but this seems more like an equivalence Fahrrad = Rad

Re: codecs for sorted indexes

2012-04-12 Thread Michael McCandless
Do you mean you are pre-sorting the documents (by what criteria?) yourself, before adding them to the index? In which case... you should already be seeing some benefits (smaller index size) than had you randomly added them (ie the vInts should take fewer bytes), I think. (Probably the savings

Re: Error

2012-04-12 Thread Abhishek tiwari
i am using 3.4 solr version... please assist... On Thu, Apr 12, 2012 at 8:41 PM, Erick Erickson erickerick...@gmail.comwrote: Please review: http://wiki.apache.org/solr/UsingMailingLists You haven't said whether, for instance, you're using trunk which is the only version that supports the

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-12 Thread Shawn Heisey
On 4/12/2012 4:52 AM, pcrao wrote: I think the index is getting corrupted because StreamingUpdateSolrServer is keeping reference to some index files that are being deleted by EmbeddedSolrServer during commit/optimize process. As a result when I Index(Full) using EmbeddedSolrServer and then do

Re: [Solr 4.0] Is it possible to do soft commit from code and not configuration only

2012-04-12 Thread Lyuba Romanchuk
Hi Mark, Thank you for reply. I tried to normalize data like in relational databases: - there are some types of documents where \ - documents with the same type have the same fields - documents with not equal types may have different fields - but all documents have type

Re: Error

2012-04-12 Thread Erick Erickson
The termfreq function is only valid for trunk. You're using 3.4. Since 'termfreq' is not recognized, Solr gets confused. Best Erick On Thu, Apr 12, 2012 at 10:20 AM, Abhishek tiwari abhishek.tiwari@gmail.com wrote: i am using 3.4 solr version... please assist... On Thu, Apr 12, 2012 at

Re: is there a downside to combining search fields with copyfield?

2012-04-12 Thread Shawn Heisey
On 4/12/2012 7:27 AM, geeky2 wrote: currently, my schema has individual fields to search on. are there advantages or disadvantages to taking several of the individual search fields and combining them in to a single search field? would this affect search times, term tokenization or possibly

Re: Problem to integrate Solr in Jetty (the first example in the Apache Solr 3.1 Cookbook)

2012-04-12 Thread Bastian Hepp
Thanks Shawn, I think I'll stay with the build in. I had problems with Solr Cell, but I could fix it. Greetings, Bastian Am 12. April 2012 18:02 schrieb Shawn Heisey s...@elyograg.org: Bastian, The jetty.xml included with Solr is littered with org.mortbay class references, which are

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Mark Miller
google must not have found it - i put that in a month or so ago I believe - at least weeks. As you can see, there is still a bit to fill in, but it covers the high level. I'd like to add example snippets for the rest soon. On Thu, Apr 12, 2012 at 12:04 PM, Benson Margulies

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Chris Hostetter
: Please see the documentation: http://wiki.apache.org/solr/SolrCloud#Required_Config : : schema.xml : : You must have a _version_ field defined: : : field name=_version_ type=long indexed=true stored=true/ Seems like this is the kind of thing that should make Solr fail hard and fast on

Re: solr 3.4 with nTiers = 2: usage of ids param causes NullPointerException (NPE)

2012-04-12 Thread Mikhail Khludnev
Dmitry, The last NPE in HighlightingComponent is just a sad coding issue. few rows later we can see that developer expected to have some docs not found // remove nulls in case not all docs were able to be retrieved rb.rsp.add(highlighting, SolrPluginUtils.removeNulls(new

Re: solr 3.4 with nTiers = 2: usage of ids param causes NullPointerException (NPE)

2012-04-12 Thread Yonik Seeley
On Wed, Apr 11, 2012 at 8:16 AM, Dmitry Kan dmitry@gmail.com wrote: We have a system with nTiers, that is: Solr front base --- Solr front -- shards Although the architecture had this in mind (multi-tier), all of the pieces are not yet in place to allow it. The errors you see are a direct

RE: solr 3.5 taking long to index

2012-04-12 Thread Rohit
Thanks for pointing these out, but I still have one concern, why is the Virtual Memory running in 300g+? Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] Sent: 12 April 2012 11:58 To:

RE: Solr 3.5 takes very long to commit gradually

2012-04-12 Thread Rohit
Thanks for pointing these out, but I still have one concern, why is the Virtual Memory running in 300g+? Regards, Rohit -Original Message- From: Tirthankar Chatterjee [mailto:tchatter...@commvault.com] Sent: 12 April 2012 13:43 To: solr-user@lucene.apache.org Subject: Re: Solr 3.5

Re: term frequency outweighs exact phrase match

2012-04-12 Thread alxsss
In that case documents 1 and 2 will not be in the results. We need them also be shown in the results but be ranked after those docs with exact match. I think omitting term frequency in calculating ranking in phrase queries will solve this issue, but I do not see that such a parameter in configs.

Re: solr 3.4 with nTiers = 2: usage of ids param causes NullPointerException (NPE)

2012-04-12 Thread Dmitry Kan
Mikhail, Thanks for sharing your thoughts. Yes I have tried checking for NULL and the entire chain of queries between tiers seems to work. But I suspect, that some docs will be missing. In principle, unless there is an OutOfMemory or a shard down, the doc ids should be retrieving valid documents.

Wildcard searching

2012-04-12 Thread Kissue Kissue
Hi, I am using the edismax query handler with solr 3.5. From the Solr admin interface when i do a wildcard search with the string: edge*, all documents are returned with exactly the same score. When i do the same search from my application using SolrJ to the same solr instance, only a few

Re: solr 3.4 with nTiers = 2: usage of ids param causes NullPointerException (NPE)

2012-04-12 Thread Dmitry Kan
Thanks Yonik, This is what I expected. How big the change would be, if I'd start just with Query and Highlight components? Did the change to QueryComponent I made make any sense to you? It would of course mean a custom solution, which I'm willing to contribute as a patch (in case anyone

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Mark Miller
I think someone already made a JIRA issue like that. I think Yonik might have had an opinion about it that I cannot remember right now. On Thu, Apr 12, 2012 at 2:21 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Please see the documentation:

Re: Wildcard searching

2012-04-12 Thread Kissue Kissue
Correction, this difference betweeen Solr admin scores and SolrJ scores happens with leading wildcard queries e.g. *edge On Thu, Apr 12, 2012 at 8:13 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am using the edismax query handler with solr 3.5. From the Solr admin interface when i do

Re: is there a downside to combining search fields with copyfield?

2012-04-12 Thread geeky2
You end up with one multivalued field, which means that you can only have one analyzer chain. actually two of the three fields being considered for combination in to a single field ARE multivalued fields. would this be an issue? With separate fields, each field can be analyzed

Re: Suggester not working for digit starting terms

2012-04-12 Thread jmlucjav
Well now I am really lost... 1. yes I want to suggest whole sentences too, I want the tokenizer to be taken into account, and apparently it is working for me in 3.5.0?? I get suggestions that are like foo bar abc. Maybe what you mention is only for file based dictionaries? I am using the field

searching across multiple fields using edismax - am i setting this up right?

2012-04-12 Thread geeky2
hello all, i just want to check to make sure i have this right. i was reading on this page: http://wiki.apache.org/solr/ExtendedDisMax, thanks to shawn for educating me. *i want the user to be able to fire a requestHandler but search across multiple fields (itemNo, productType and brand)

Re: Responding to Requests with Chunks/Streaming

2012-04-12 Thread Mikhail Khludnev
Hello Developers, I just want to ask don't you think that response streaming can be useful for things like OLAP, e.g. is you have sharded index presorted and pre-joined by BJQ way you can calculate counts in many cube cells in parallel? Essential distributed test for response streaming just

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Yonik Seeley
On Thu, Apr 12, 2012 at 2:21 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Please see the documentation: http://wiki.apache.org/solr/SolrCloud#Required_Config : : schema.xml : : You must have a _version_ field defined: : : field name=_version_ type=long indexed=true stored=true/

[ANNOUNCE] Apache Solr 3.6 released

2012-04-12 Thread Robert Muir
12 April 2012, Apache Solr™ 3.6.0 available The Lucene PMC is pleased to announce the release of Apache Solr 3.6.0. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting,

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Chris Hostetter
: Off the top of my head: : _version_ is needed for solr cloud where a leader forwards updates to : replicas, unless you're handing update distribution yourself or : providing pre-built shards. : _version_ is needed for realtime-get and optimistic locking : : We should document for sure... but

RE: [ANNOUNCE] Apache Solr 3.6 released

2012-04-12 Thread Robert Petersen
I think this page needs updating... it says it's not out yet. https://wiki.apache.org/solr/Solr3.6 -Original Message- From: Robert Muir [mailto:rm...@apache.org] Sent: Thursday, April 12, 2012 1:33 PM To: d...@lucene.apache.org; solr-user@lucene.apache.org; Lucene mailing list;

Re: [ANNOUNCE] Apache Solr 3.6 released

2012-04-12 Thread Robert Muir
Hi, Just edit it! its a wiki page anyone can edit! There are probably other out of date ones too On Thu, Apr 12, 2012 at 5:57 PM, Robert Petersen rober...@buy.com wrote: I think this page needs updating...  it says it's not out yet. https://wiki.apache.org/solr/Solr3.6 -Original

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Benson Margulies
I'm probably confused, but it seems to me that the case I hit does not meet any of Yonik's criteria. I have no replicas. I'm running SolrCloud in the simple mode where each doc ends up in exactly one place. I think that it's just a bug that the code refuses to do the local deletion when there's

Re: codecs for sorted indexes

2012-04-12 Thread Carlos Gonzalez-Cadenas
Hello Michael, Yes, we are pre-sorting the documents before adding them to the index. We have a score associated to every document (not an IR score but a document-related score that reflects its importance). Therefore, the document with the biggest score will have the lowest docid (we add it

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Benson Margulies
On Thu, Apr 12, 2012 at 2:14 PM, Mark Miller markrmil...@gmail.com wrote: google must not have found it - i put that in a month or so ago I believe - at least weeks. As you can see, there is still a bit to fill in, but it covers the high level. I'd like to add example snippets for the rest

Re: is there a downside to combining search fields with copyfield?

2012-04-12 Thread Shawn Heisey
On 4/12/2012 1:37 PM, geeky2 wrote: can you elaborate on this and how EDisMax would preclude the need for copyfield? i am using extended dismax now in my response handlers. here is an example of one of my requestHandlers requestHandler name=partItemNoSearch class=solr.SearchHandler

Re: Solr Scoring

2012-04-12 Thread Erick Erickson
No, I don't think there's an OOB way to make this happen. It's a recurring theme, make exact matches score higher than stemmed matches. Best Erick On Thu, Apr 12, 2012 at 5:18 AM, Kissue Kissue kissue...@gmail.com wrote: Hi, I have a field in my index called itemDesc which i am applying

Re: Solr Scoring

2012-04-12 Thread Walter Underwood
It is easy. Create two fields, text_exact and text_stem. Don't use the stemmer in the first chain, do use the stemmer in the second. Give the text_exact a bigger weight than text_stem. wunder On Apr 12, 2012, at 4:34 PM, Erick Erickson wrote: No, I don't think there's an OOB way to make this

Re: two structures in solr

2012-04-12 Thread Erick Erickson
You have to take off your DB hat when using Solr G... There is no problem at all having documents in the same index that are of different types. There is no penalty for field definitions that aren't used. That is, you can easily have two different types of documents in the same index. It's all

Re: solr 3.5 taking long to index

2012-04-12 Thread Shawn Heisey
On 4/12/2012 12:42 PM, Rohit wrote: Thanks for pointing these out, but I still have one concern, why is the Virtual Memory running in 300g+? Solr 3.5 uses MMapDirectoryFactory by default to read the index. This does an mmap on the files that make up your index, so their entire contents are

Re: Dismax request handler differences Between Solr Version 3.5 and 1.4

2012-04-12 Thread Erick Erickson
Then I suspect your solrconfig is different or you're using a *slightly* different URL. When you specify defType=dismax, you're NOT going to the dismax requestHandler. You're specifying a dismax style parser, and Solr expects that you're going to provide all the parameters on the URL. To whit:

Re: Further questions about behavior in ReversedWildcardFilterFactory

2012-04-12 Thread Erick Erickson
There is special handling build into Solr (but not Lucene I don't think) that deals with the reversed case, that's probably the source of your differences. Leading wildcards are extremely painful if you don't do some trick like Solr does with the reversed stuff. In order to run, you have to spin

Re: Suggester not working for digit starting terms

2012-04-12 Thread Robert Muir
On Thu, Apr 12, 2012 at 3:52 PM, jmlucjav jmluc...@gmail.com wrote: Well now I am really lost... 1. yes I want to suggest whole sentences too, I want the tokenizer to be taken into account, and apparently it is working for me in 3.5.0?? I get suggestions that are like foo bar abc.  Maybe what

Re: Import null values from XML file

2012-04-12 Thread Erick Erickson
What does treated as null mean? Deleted from the doc? The problem here is that null-ness is kind of tricky. What behaviors do you want out of Solr in the NULL case? You can drop this out of the document by writing a custom updateHandler. It's actually quite simple to do. Best Erick On Thu, Apr

Re: codecs for sorted indexes

2012-04-12 Thread Robert Muir
On Thu, Apr 12, 2012 at 6:35 PM, Carlos Gonzalez-Cadenas c...@experienceon.com wrote: Hello Michael, Yes, we are pre-sorting the documents before adding them to the index. We have a score associated to every document (not an IR score but a document-related score that reflects its importance).

Re: searching across multiple fields using edismax - am i setting this up right?

2012-04-12 Thread Erick Erickson
Looks good on a quick glance. There are a couple of things... 1 there's no need for the qt param _if_ you specify the name as /partItemNoSearch, just use blahblah/solr/partItemNoSearch There's a JIRA about when/if you need at. Either will do, it's up to you which you prefer. 2 I'd consider

Re: Solr Scoring

2012-04-12 Thread Erick Erickson
GAH! I had my head in make this happen in one field when I wrote my response, without being explicit. Of course Walter's solution is pretty much the standard way to deal with this. Best Erick On Thu, Apr 12, 2012 at 5:38 PM, Walter Underwood wun...@wunderwood.org wrote: It is easy. Create two

Re: solr hangs

2012-04-12 Thread Peter Markey
Thanks for the response. I have given a size of 8gb for the instance and has only around few thousands of documents (with 15 fields each having small amount of data)..apparently the problem is the process (solr jetty instance) is consuming lots of threads...one time it consumed around 50k threads

Re: Solr Http Caching

2012-04-12 Thread Chris Hostetter
: Are any of you using Solr Http caching? I am interested to see how people : use this functionality. I have an index that basically changes once a day : at midnight. Is it okay to enable Solr Http caching for such an index and : set the max age to 1 day? Any potential issues? : : I am using

Re: Does the lucene can read the index file from solr?

2012-04-12 Thread a sd
hi,neosky, how to do? i need this way too. thanks On Thu, Apr 12, 2012 at 9:35 PM, neosky neosk...@yahoo.com wrote: Thanks!I will try again -- View this message in context: http://lucene.472066.n3.nabble.com/Does-the-lucene-can-read-the-index-file-from-solr-tp3902927p3905364.html Sent from

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-12 Thread Otis Gospodnetic
Hello Ali, I'm trying to setup a large scale *Crawl + Index + Search *infrastructure using Nutch and Solr/Lucene. The targeted scale is *5 Billion web pages*, crawled + indexed every *4 weeks, *with a search latency of less than 0.5 seconds. That's fine.  Whether it's doable with any tech

Re: term frequency outweighs exact phrase match

2012-04-12 Thread Chris Hostetter
: I use solr 3.5 with edismax. I have the following issue with phrase : search. For example if I have three documents with content like : : 1.apache apache : 2. solr solr : 3.apache solr : : then search for apache solr displays documents in the order 1,.2,3 : instead of 3, 2, 1 because term

RE: solr 3.5 taking long to index

2012-04-12 Thread Rohit
The machine has a total ram of around 46GB. My Biggest concern is Solr index time gradually increasing and then the commit stops because of timeouts, out commit rate is very high, but I am not able to find the root cause of the issue. Regards, Rohit Mobile: +91-9901768202 About Me:

Re: solr 3.5 taking long to index

2012-04-12 Thread Shawn Heisey
On 4/12/2012 8:42 PM, Rohit wrote: The machine has a total ram of around 46GB. My Biggest concern is Solr index time gradually increasing and then the commit stops because of timeouts, out commit rate is very high, but I am not able to find the root cause of the issue. For good performance,

  1   2   >