Re: Which Tokeniser (and/or filter)

2012-02-08 Thread Rob Brown
Apologies if things were a little vague. Given the example snippet to index (numbered to show searches needed to match)... 1: i am a sales-manager in here 2: using asp.net and .net daily 3: working in design. 4: using something called sage 200. and i'm fluent 5: german sausages. 6: busy AE dept

Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread Ted Dunning
This is true with Lucene as it stands. It would be much faster if there were a specialized in-memory index such as is typically used with high performance search engines. On Tue, Feb 7, 2012 at 9:50 PM, Lance Norskog goks...@gmail.com wrote: Experience has shown that it is much faster to run

Re:Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread James
But the solr did not have the im-memory index, I am right? At 2012-02-08 16:17:49,Ted Dunning ted.dunn...@gmail.com wrote: This is true with Lucene as it stands. It would be much faster if there were a specialized in-memory index such as is typically used with high performance search

Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread Patrick Plaatje
A start maybe to use a RAM disk for that. Mount is as a normal disk and have the index files stored there. Have a read here: http://en.wikipedia.org/wiki/RAM_disk Cheers, Patrick 2012/2/8 Ted Dunning ted.dunn...@gmail.com This is true with Lucene as it stands. It would be much faster if

Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread Dmitry Kan
Hi, This talk has some interesting details on setting up an Lucene index in RAM: http://www.lucidimagination.com/devzone/events/conferences/revolution/2011/lucene-yelp Would be great to hear your findings! Dmitry 2012/2/8 James ljatreey...@163.com Is there any practice to load index into

Query in starting solr 3.5

2012-02-08 Thread mechravi25
Hi, I am using solr 3.5 version. I moved the data import handler files from solr 1.4(which I used previously) to the new solr. When I tried to start the solr 3.5, I got the following message in my log WARNING: XML parse warning in solrres:/dataimport.xml, line 2, column 95: Include operation

Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread Andrzej Bialecki
On 08/02/2012 09:17, Ted Dunning wrote: This is true with Lucene as it stands. It would be much faster if there were a specialized in-memory index such as is typically used with high performance search engines. This could be implemented in Lucene trunk as a Codec. The challenge though is to

Re: Improving performance for SOLR geo queries?

2012-02-08 Thread Matthias Käppler
Hi Erick, if we're not doing geo searches, we filter by location tags that we attach to places. This is simply a hierachical regional id, which is simple to filter for, but much less flexible. We use that on Web a lot, but not on mobile, where we want to performance searches in arbitrary radii

Re: URI Encoding with Solr and Weblogic

2012-02-08 Thread Elisabeth Adler
Hi, I found a solution to it. Adding the Weblogic Server Argument -Dfile.encoding=UTF-8 did not affect the encoding. Only a change to the .war file's weblogic.xml and redeployment of the modified .war solved it. I added the following to the weblogic.xml: charset-params input-charset

How to reindex about 10Mio. docs

2012-02-08 Thread Vadim Kisselmann
Hello folks, i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to another Solr(1.4.1). I changed my schema.xml (field types sing to slong), standard replication would fail. what is the fastest and smartest way to manage this? this here sound great (EntityProcessor):

Re: How to reindex about 10Mio. docs

2012-02-08 Thread Ahmet Arslan
i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to another Solr(1.4.1). I changed my schema.xml (field types sing to slong), standard replication would fail. what is the fastest and smartest way to manage this? this here sound great (EntityProcessor):

Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread Robert Stewart
I concur with this. As long as index segment files are cached in OS file cache performance is as about good as it gets. Pulling segment files into RAM inside JVM process may actually be slower, given Lucene's existing data structures and algorithms for reading segment file data. If you have

Re: How to reindex about 10Mio. docs

2012-02-08 Thread Vadim Kisselmann
Hi Ahmet, thanks for quick response:) I've already thought the same... And it will be a pain to export and import this huge doc-set as CSV. Do i have an another solution? Regards Vadim 2012/2/8 Ahmet Arslan iori...@yahoo.com: i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to another

usage of /etc/jetty.xml when debugging Solr in Eclipse

2012-02-08 Thread jmlucjav
Hi, I am following http://www.lucidimagination.com/devzone/technical-articles/setting-apache-solr-eclipse in order to be able to debug Solr in eclipse. I got it working fine. Now, I usually use ./etc/jetty.xml to set logging configuration. When starting jetty in eclipse I dont see any log files

Re: How to reindex about 10Mio. docs

2012-02-08 Thread Vadim Kisselmann
Another problem appeared ;) how can i export my docs in csv-format? In Solr 3.1+ i can use the query-param wt=csv, but in Solr 1.4.1? Best Regards Vadim 2012/2/8 Vadim Kisselmann v.kisselm...@googlemail.com: Hi Ahmet, thanks for quick response:) I've already thought the same... And it will

Custom Document Clustering and Mahout Integration

2012-02-08 Thread Selvam
Hi all, I am trying to write a custom document clustering component that should take all the docs in commit and cluster them; Solr Version:3.5.0 Main Class: public class KMeansClusteringEngine extends DocumentClusteringEngine implements SolrEventListener I added newSearcher event listener, that

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-08 Thread Erick Erickson
Hmmm, seems OK. Did you re-index after any schema changes? You'll learn to love admin/analysis for questions like this, that page should show you what the actual tokenization results are, make sure to click the verbose check boxes. Best Erick On Tue, Feb 7, 2012 at 10:52 PM, geeky2

Re: Which Tokeniser (and/or filter)

2012-02-08 Thread Erick Erickson
Yes, WDDF creates multiple tokens. But that has nothing to do with the multiValued suggestion. You can get exactly what you want by 1 setting multiValued=true in your schema file and re-indexing. Say positionIncrementGap is set to 100 2 When you index, add the field for each sentence, so your doc

Re: Fields not indexed?

2012-02-08 Thread Dmitry Kan
How does your schema for the fields look like? On Wed, Feb 8, 2012 at 2:41 PM, Radu Toev radut...@gmail.com wrote: Hi, I am really new to Solr so I apologize if the question is a little off. I was playing with DataImportHandler and tried to index a table in a MS SQL database. I configured

Re: Fields not indexed?

2012-02-08 Thread Radu Toev
The schema.xml is the default file that comes with Solr 3.5, didn't change anything there. On Wed, Feb 8, 2012 at 2:45 PM, Dmitry Kan dmitry@gmail.com wrote: How does your schema for the fields look like? On Wed, Feb 8, 2012 at 2:41 PM, Radu Toev radut...@gmail.com wrote: Hi, I am

Re: Fields not indexed?

2012-02-08 Thread Dmitry Kan
well, you should add these fields in schema.xml, otherwise solr won't know them. On Wed, Feb 8, 2012 at 2:48 PM, Radu Toev radut...@gmail.com wrote: The schema.xml is the default file that comes with Solr 3.5, didn't change anything there. On Wed, Feb 8, 2012 at 2:45 PM, Dmitry Kan

Re: Fields not indexed?

2012-02-08 Thread Radu Toev
I just realized that as I pushed the send button :P Thanks, I'll have a look. On Wed, Feb 8, 2012 at 2:58 PM, Dmitry Kan dmitry@gmail.com wrote: well, you should add these fields in schema.xml, otherwise solr won't know them. On Wed, Feb 8, 2012 at 2:48 PM, Radu Toev radut...@gmail.com

Re: usage of /etc/jetty.xml when debugging Solr in Eclipse

2012-02-08 Thread Bernd Fehling
Hi, run-jetty-run issue #9: ... In the VM Arguments of your launch configuration set -Drjrxml=./jetty.xml If jetty.xml is in the root of your project it will be used (you can also use a fully qualified path name). The UI port, context and WebApp dir are ignored, since you can define them in

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-08 Thread geeky2
hello, thank you for the reply. yes - i did re-index after the changes to the schema. also - thank you for the direction on using the analyzer - but i am not sure if i am interpreting the feedback from the analyzer correctly. here is what i did: in the Field value (Index) box - i placed this:

Re: Which Tokeniser (and/or filter)

2012-02-08 Thread Robert Brown
Thanks Erick, I didn't get confused with multiple tokens vs multiValued :) Before I go ahead and re-index 4m docs, and believe me I'm using the analysis page like a mad-man! What do I need to configure to have the following both indexed with and without the dots... .net sales manager. £12.50

Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread Ted Dunning
Add this as well: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.155.5030 On Wed, Feb 8, 2012 at 1:56 AM, Andrzej Bialecki a...@getopt.org wrote: On 08/02/2012 09:17, Ted Dunning wrote: This is true with Lucene as it stands. It would be much faster if there were a specialized

How to identify the field with highest score in dismax

2012-02-08 Thread crisfromnova
Hi, According solr documentation the dismax score is calculating after the formula : (score of matching clause with the highest score) + ( (tie paramenter) * (scores of any other matching clauses) ). Is there a way to identify the field on which the matching clause score is the highest? For

Sorting solrdocumentlist object after querying

2012-02-08 Thread Kashif Khan
Hi all, I want to sort a SolrDocumentList after it has been queried and obtained from the QueryResponse.getResults(). The reason is i have a SolrDocumentList obtained after querying using QueryResponse.getResults() and i have added few docs to it. Now i want to sort this SolrDocumentList based on

Wildcard ? issue?

2012-02-08 Thread Dalius Sidlauskas
Sorry for inaccurate title. I have a 3 fields (dc_title, dc_title_unicode, dc_unicode_full) containing same value: title xmlns=http://www.tei-c.org/ns/1.0;cal.lígraf/title and these fields are configured accordingly: fieldType name=xml class=solr.TextField positionIncrementGap=100

Re: Wildcard ? issue?

2012-02-08 Thread Dalius Sidlauskas
If you can not read this mail easily check this ticket: https://issues.apache.org/jira/browse/SOLR-3106 This is a copy. Regards! Dalius Sidlauskas On 08/02/12 15:44, Dalius Sidlauskas wrote: Sorry for inaccurate title. I have a 3 fields (dc_title, dc_title_unicode, dc_unicode_full)

Re: Wildcard ? issue?

2012-02-08 Thread Sethi, Parampreet
Hi Dalius, If not already tried, Check http://localhost:8983/solr/admin/analysis.jsp (enable verbose output for both Field Value index and query for details) for your queries and see what all filters/tokenizers are being applied. Hope it helps! -param On 2/8/12 10:48 AM, Dalius Sidlauskas

Re: Wildcard ? issue?

2012-02-08 Thread Dalius Sidlauskas
I have already tried this and it did not helped because it does not highlight matches if wild-card is used. The field configuration turns data to: dc_title: calligraf dc_title_unicode: cal·lígraf dc_title_unicode_full: cal·lígraf Debug parsedquery says: [Search for *cal·ligraf*]

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-08 Thread Erick Erickson
Hmmm, that all looks correct, from the output you pasted I'd expect you to be finding the doc. So next thing: add debugQuery=on to your query and look at the debug information after the list of documents, particularly the parsedQuery bit. Are you searching against the fields you think you are? If

Re: Which Tokeniser (and/or filter)

2012-02-08 Thread Erick Erickson
You'll probably have to index them in separate fields to get what you want. The question is always whether it's worth it, is the use-case really well served by having a variant that keeps dots and things? But that's always more a question for your product manager Best Erick On Wed, Feb 8,

Re: Which Tokeniser (and/or filter)

2012-02-08 Thread Robert Brown
Attempting to re-produce legacy behaviour (i know!) of simple SQL substring searching, with and without phrases. I feel simply NGram'ing 4m CV's may be pushing it? --- IntelCompute Web Design Local Online Marketing http://www.intelcompute.com On Wed, 8 Feb 2012 11:27:24 -0500, Erick

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-08 Thread geeky2
hello, thanks for sticking with me on this ...very frustrating ok - i did perform the query with the debug parms using two scenarios: 1) a successful search (where i insert the period / dot) in to the itemNo field and the search returns a document. itemNo:BP2.1UAA

Re: Wildcard ? issue?

2012-02-08 Thread Ahmet Arslan
I have already tried this and it did not helped because it does not highlight matches if wild-card is used. The field configuration turns data to: This writeup should explain your scenario : http://wiki.apache.org/solr/MultitermQueryAnalysis

Re: solr cloud concepts

2012-02-08 Thread Mark Miller
On Feb 8, 2012, at 10:31 AM, Adeel Qureshi wrote: I have been using solr for a while and have recently started getting into solrcloud .. i am a bit confused with some of the concepts .. 1. what exactly is the relationship between a collection and the core .. can a core has multiple

Re: Sorting solrdocumentlist object after querying

2012-02-08 Thread Ahmet Arslan
I want to sort a SolrDocumentList after it has been queried and obtained from the QueryResponse.getResults(). The reason is i have a SolrDocumentList obtained after querying using QueryResponse.getResults() and i have added few docs to it. Now i want to sort this SolrDocumentList based on

Re: solr cloud concepts

2012-02-08 Thread Bruno Dumon
Hi Adeel, I just started looking into SolrCloud and had some of the same questions. I wrote a blog with the understanding I gained so far, maybe it will help you: http://outerthought.org/blog/491-ot.html Regards, Bruno. On Wed, Feb 8, 2012 at 4:31 PM, Adeel Qureshi

Re: How to reindex about 10Mio. docs

2012-02-08 Thread Otis Gospodnetic
Vadim, Would using xslt output help? Otis  Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Vadim Kisselmann v.kisselm...@googlemail.com To: solr-user@lucene.apache.org Sent: Wednesday,

Re: Using UUID for uniqueId

2012-02-08 Thread François Schiettecatte
Anderson I would say that this is highly unlikely, but you would need to pay attention to how they are generated, this would be a good place to start: http://en.wikipedia.org/wiki/Universally_unique_identifier Cheers François On Feb 8, 2012, at 1:31 PM, Anderson vasconcelos wrote:

Thank you all

2012-02-08 Thread Tim Hibbs
All, It appears my attempt at using solr for the application I support is about to fail. I'm personally and professionally disappointed, but I wanted to say Many Thanks to those of you who have provided so much help to so many on this list. In the right hands and in the right environments, it has

solr/tomcat performance.

2012-02-08 Thread adm1n
Hi, I'm running solr+tomcat with the following configuration: I have 16 slaves, which are being queried by aggregator, while aggregator being queried by the users. My slaveUrls variable in solr.xml (on aggregator) looks like - 'property name=slaveUrls

Index Start Question

2012-02-08 Thread Hoffman, Chase
Please forgive me if this is a dumb question. I've never dealt with SOLR before, and I'm being asked to determine from the logs when a SOLR index is kicked off (it is a Windows server). The TOMCAT service runs continually, so no love there. In parsing the logs, I think

SolrCloud is in trunk.

2012-02-08 Thread Mark Miller
For those that are interested and have not noticed, the latest work on SolrCloud and distributed indexing is now in trunk. SolrCloud is our name for a new set of distributed capabilities that improve upon the old style distributed search and index based replication. It provides for high

Re: Using UUID for uniqueId

2012-02-08 Thread Anderson vasconcelos
Thanks 2012/2/8 François Schiettecatte fschietteca...@gmail.com Anderson I would say that this is highly unlikely, but you would need to pay attention to how they are generated, this would be a good place to start: http://en.wikipedia.org/wiki/Universally_unique_identifier Cheers

Re: SolrCloud is in trunk.

2012-02-08 Thread darren
Good job on this work. A monumental effort. On Wed, 8 Feb 2012 16:41:13 -0500, Mark Miller markrmil...@gmail.com wrote: For those that are interested and have not noticed, the latest work on SolrCloud and distributed indexing is now in trunk. SolrCloud is our name for a new set of

Re: Improving performance for SOLR geo queries?

2012-02-08 Thread Ryan McKinley
Hi Matthias- I'm trying to understand how you have your data indexed so we can give reasonable direction. What field type are you using for your locations? Is it using the solr spatial field types? What do you see when you look at the debug information from debugQuery=true? From my

Re: solr cloud concepts

2012-02-08 Thread Adeel Qureshi
okay so after reading Bruno's blog post .. lets add slice to the mix as well .. so we have got collections, cores, shards, partitions and slices :) .. The whole point with cores is to be able to have different schemas on the same solr server instance. So how does that changes with collections ..

linking documents in solr

2012-02-08 Thread T Vinod Gupta
hi, I have a question around documents linking in solr and want to know if its possible. lets say i have a set of blogs and their authors that i want to index seperately. is it possible to link a document describing a blog to another document describing an author? if yes, can i search for blogs

Re: solr cloud concepts

2012-02-08 Thread Mark Miller
On Feb 8, 2012, at 5:26 PM, Adeel Qureshi wrote: okay so after reading Bruno's blog post .. lets add slice to the mix as well .. so we have got collections, cores, shards, partitions and slices :) .. Yeah - heh - this has bugged me, but we have not really all come down on agreement of

Re: Improving performance for SOLR geo queries?

2012-02-08 Thread Nicolas Flacco
I compared locallucene to spatial search and saw a performance degradation, even using geohash queries, though perhaps I indexed things wrong? Locallucene across 6 machines handles 150 queries per second fine, but using geofilt and geohash I got lots of timeouts even when I was doing only 50

Re: usage of /etc/jetty.xml when debugging Solr in Eclipse

2012-02-08 Thread jmlucjav
yes, I am using https://github.com/alexwinston/RunJettyRun that apparently is a fork of the original project that originated in the need to use an jetty.xml. So I am already setting an additional jetty.xml, this can be done in the Run configuration, no need to use -D param. But as I mentioned

Re: solr cloud concepts

2012-02-08 Thread Jamie Johnson
Mark, is the recommendation now to have each solr instance be a separate core in solr cloud? I had thought that the core name was by default the collection name? Or are you saying that although they have the same name they are separate because they are in different JVMs? On Wednesday, February 8,

Re: solr cloud concepts

2012-02-08 Thread Mark Miller
On Feb 8, 2012, at 9:36 PM, Jamie Johnson wrote: Mark, is the recommendation now to have each solr instance be a separate core in solr cloud? I had thought that the core name was by default the collection name? Or are you saying that although they have the same name they are separate

Re: multiple cores in a single instance vs multiple instances with single core

2012-02-08 Thread Mark Miller
On Feb 8, 2012, at 9:52 PM, Jamie Johnson wrote: In solr cloud what is a better approach / use of resources having multiple cores on a single instance or multiple instances with a single core? What are the benefits and drawbacks of each? It depends I suppose. If you are talking about on a

Re: multiple cores in a single instance vs multiple instances with single core

2012-02-08 Thread Jamie Johnson
Thanks Mark, in regards to failover I completely agree, I am wondering more about performance and memory usage if the indexes are large and wondering if the separate Java instances under heavy load would more or less performant. Currently we deploy a single core per instance but deploy multiple

Re: solr cloud concepts

2012-02-08 Thread Adeel Qureshi
Thanks for the explanation. It makes sense but I am hoping that you can clarify things a bit more .. so now it sounds like in solrcloud the concept of cores have changed a bit .. as you explained that for me to have 2 cores with different schemas I will need 2 different collections .. and one

Re: Sorting solrdocumentlist object after querying

2012-02-08 Thread Kashif Khan
No that sorting is based on multiple fields. Basically i want to sort them as the group by statement like in the SQL based on few fields and many loops to go through. The problem is that i have say 1,000,000 solr docs after injecting my few solr docs and then i want to do group by these solr docs

How do i do group by in solr with multiple shards?

2012-02-08 Thread Kashif Khan
Hi all, I have tried group by in solr with multiple shards but it does not work. Basically i want to simply do GROUP BY statement like in SQL in solr with multiple shards. Please suggest me how can i do this as it is not supported currently OOB by solr. Thanks regards, Kashif Khan -- View this

Re: How to identify the field with highest score in dismax

2012-02-08 Thread Mikhail Khludnev
Hello, Have you tried to specify debugQuery=on and look into explain section? Though it's not really performant, but anyway I propose to start from it. Regards On Wed, Feb 8, 2012 at 7:32 PM, crisfromnova crisfromn...@gmail.com wrote: Hi, According solr documentation the dismax score is