Re: Issue with spellcheck and autosuggest

2013-01-29 Thread Artyom
you should check not suggestions, but collations in the response xml -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-spellcheck-and-autosuggest-tp4036208p4036977.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: [ANNOUNCE] Web Crawler

2013-01-29 Thread SivaKarthik
Klein, Thank you for ur reply.. i hosted the application in apache2 server and able to access the link http://localhost/search/ but while accessing http://localhost/crawler/login.php its showing the error msg as Access denied for user 'crawler'@'localhost' (using

Re: [ANNOUNCE] Web Crawler

2013-01-29 Thread SivaKarthik
Hi, i resolved the issue Access denied for user 'crawler'@'localhost' (using password: YES) mysql user crawler/crawler was created and privileges added as mentioned in the tutorial.. Thank you. -- View this message in context:

Re: indexing Text file in solr

2013-01-29 Thread Edward Garrett
i don't have experience with this but it looks like you could use, from DIH: http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor On Sun, Jan 27, 2013 at 10:23 AM, hadyelsahar hadyelsa...@gmail.com wrote: i have a large Arabic Text File that contains Tweets each line contains one

overlap function query

2013-01-29 Thread Daniel Rosher
Hi, I'm wondering if there exists or if someone has implemented something like the following as a function query: overlap(query,field) = number of matching terms in field/number of terms in field e.g. with three docs having these tokens(e.g.A B C) in a field D 1:A B B 2:A B 3:A The overlap

How to disable compression on stored fields in Solr 4.1?

2013-01-29 Thread Artyom
I tried Solr 4.1, reindexed data using DIH (full-import) and compared response time with version 4.0. Response time increased 1.5-2 times. How to disable compression on stored fields in Solr 4.1? I tried to change codec version in solrconfig: luceneMatchVersionLUCENE_40/luceneMatchVersion and

Re: why search time increases without term vectors?

2013-01-29 Thread Upayavira
No, not at all. Presence or not of term vectors won't impact replication in that way. For SolrCloud, it is up to each node to create term vectors when it receives a document for indexing. Using 3.x style replication, the slave will pull all changed files making up changed segments on replication,

Re: How to disable compression on stored fields in Solr 4.1?

2013-01-29 Thread Artyom
I guess, I have to write a new codec that uses a stored fields format which does not compress stored fields such as Lucene40StoredFieldsFormat http://blog.jpountz.net/post/35667727458/stored-fields-compression-in-lucene-4-1 What is the purpose of luceneMatchVersion tag then, does it affect

Re: indexVersion returns multiple results when called

2013-01-29 Thread davidq
Hi, We thought we'd sorted this out but it's come back again. We're on 4.0GA which I forgot to mention before. We disabled polling on the slaves and have a PHP script that gets the current version number of the master, fires a reindex (optimize,clean) and then loops on a sleep(120) function at

Re: why search time increases without term vectors?

2013-01-29 Thread Artyom
Yes, I guess, full index replication is a general bug of 4.x. I tried the same routine with termVectors and got the same result: 1. stopped all Solr instances 2. cleared data folders of all instances 3. ran master, made full-import with optimize option using DIH 4. after import ran slave and did

Re: web app :Returning document ID From Solr search

2013-01-29 Thread Michael Della Bitta
If your metadata requirements aren't too heavy, you could store all the title, author, etc. info in Solr along with the index of the full text of the document. Then when you submitted a query to Solr, you could retrieve back the list of information you'd need to display a page of search results,

Issue with mutiple records in full text search

2013-01-29 Thread Soumyanayan Kar
Hi, We are trying to use solr for a text based search solution in a web application. The documents that are getting indexed are essentially text based files like *.txt, *.pdf, etc. We are using the Tika extraction plugin to extract the text content from the files and storing it using a

Issue with mutiple records in full text search

2013-01-29 Thread Soumyanayan Kar
Hi, We are trying to use solr for a text based search solution in a web application. The documents that are getting indexed are essentially text based files like *.txt, *.pdf, etc. We are using the Tika extraction plugin to extract the text content from the files and storing it using a

Re: Issue with mutiple records in full text search

2013-01-29 Thread Jack Krupansky
The number of hits of a term in a Solr document impacts the score, but still only counts as one hit in the numFound count. Solr doesn't track hits for individual term occurrences, except that you could check the term frequency of a specific term in a specific document if you wanted, using a

RE: Issue with mutiple records in full text search

2013-01-29 Thread Soumyanayan Kar
Thanks Jack for the explanation. But lets say if my requirement needs me to return all occurrences of the search term along with the text snippet around them for each document under the search scope, how do we go about achieving that with Solr? Thanks Regards, Soumya. -Original

Re: How to disable compression on stored fields in Solr 4.1?

2013-01-29 Thread Shawn Heisey
On 1/29/2013 4:57 AM, Artyom wrote: I guess, I have to write a new codec that uses a stored fields format which does not compress stored fields such as Lucene40StoredFieldsFormat http://blog.jpountz.net/post/35667727458/stored-fields-compression-in-lucene-4-1 What is the purpose of

Re: Issue with mutiple records in full text search

2013-01-29 Thread Jack Krupansky
I don't know if there is a highlighter option to highlight all hits in a document, as opposed to the snippet for the first. If not, you could right your own highlighter search component to do that. But it may be possible to use pieces of code that are already there. -- Jack Krupansky

Re: How to disable compression on stored fields in Solr 4.1?

2013-01-29 Thread Shawn Heisey
On 1/29/2013 7:40 AM, Shawn Heisey wrote: I don't think there's a way to turn off the stored field compression in the 4.1 index format, but I think there is something else you can do right now - switch to the 4.0 index format. To do this, you need a postingsFormat value of Lucene40 on some or

Re: edismax, qf, multiterm analyzer bug?

2013-01-29 Thread Jack Krupansky
Looks like a bug to me. Actually, when I try it with the Solr 4.0 example, I get: str name=rawquerystringO*t*v*h/str str name=querystringO*t*v*h/str str name=parsedquery(+DisjunctionMaxQuery((sku:otvh)))/no_coord/str str name=parsedquery_toString+(sku:otvh)/str For: curl

thanks for solr 4.1

2013-01-29 Thread Bernd Fehling
Now this must be said, thanks for solr 4.1 (and lucene 4.1)! Great improvements compared to 4.0. After building the first 4.1 index I thought the index was broken, but had no error messages anywhere. Why I thought it was damaged? The index size went down from 167 GB (solr 4.0) to 115 GB (solr

RE: thanks for solr 4.1

2013-01-29 Thread Pires, Guilherme
Subscribed! Just integrating solr 4.1 in a corporate GIS architecture as we speak. Thanks! Guilherme Pires -Original Message- From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] Sent: terça-feira, 29 de Janeiro de 2013 15:34 To: solr-user@lucene.apache.org Subject: thanks for

Solr Data Config Queries per Field

2013-01-29 Thread O. Olson
Hi, I am new to Solr, and I am using the DataImportHandler to Query a SQL Server and populate Solr. I specify the SQL Query in the db-data-config.xml file. Each SQL Query seems to be associated with an entity. Is it possible to have a query per field? I think it would be easier to explain

Re: Solr Data Config Queries per Field

2013-01-29 Thread Gora Mohanty
On 29 January 2013 22:42, O. Olson olson_...@yahoo.it wrote: [...] SQL Database Schema: Table: Prod_Table Column 1: SKU - ID/Primary Key Column 2: Title Table: Cat_Table Column 1: SKU - Foreign Key Column 2: CategoryLevel Column 3: CategoryName Where CategoryLevel is 1, I would like

Re: Solr Data Config Queries per Field

2013-01-29 Thread O. Olson
Gora Mohanty-3 wrote On 29 January 2013 22:42, O. Olson lt; olson_ord@ gt; wrote: [...] SQL Database Schema: Table: Prod_Table Column 1: SKU - ID/Primary Key Column 2: Title Table: Cat_Table Column 1: SKU - Foreign Key Column 2: CategoryLevel Column 3: CategoryName Where

Re: Solr Data Config Queries per Field

2013-01-29 Thread Gora Mohanty
On 29 January 2013 23:34, O. Olson olson_...@yahoo.it wrote: [...] Thank you. Good call Gora, I forgot to mention about the query. I am trying to query something like the following in the URL for the Example: http://localhost:8983/solr/db/select ?q=queryfacet=truefacet.field=Category1 I

Re: Problem with migration from solr 3.5 with SOLR-2155 usage to solr 4.0

2013-01-29 Thread Smiley, David W.
The wiki is open to everyone. If you do edit it, please try to keep it organized. On 1/24/13 9:41 AM, Viacheslav Davidovich viacheslav.davidov...@objectstyle.com wrote: Hi David, thank you for your answer. After update to this field type and change the SOLR query I receive required

Re: MERGING SPATIAL SEARCH QUERY

2013-01-29 Thread Smiley, David W.
Hi Jaspreet. Your post is confusing. You're using spatial, so you say, yet your question suggests you have yet to use it. If your documents are associated with a city, then you should index the lat-lon location of that city in your documents. It's denormalized like this. ~ David On 1/23/13

Re: overlap function query

2013-01-29 Thread Mikhail Khludnev
Daniel, You can start from here http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/search/similarities/Similarity.html#coord%28int,%20int%29but it requires deep understanding of Lucene internals On Tue, Jan 29, 2013 at 2:12 PM, Daniel Rosher rosh...@gmail.com wrote: Hi, I'm

RE: Solr load balancer

2013-01-29 Thread Phil Hoy
Hi Erick, Thanks, I have read the blogs you cited and I found them very interesting, and we have tuned the jvm accordingly but still we get the odd longish gc pause. That said we perhaps have an unusual setup; we index a lot of small documents using servers with ssd's and 128 GB RAM in a

RE: Solr Faceting with Name Values

2013-01-29 Thread Petersen, Robert
Hi O.O 1. Yes faceting on field function_s would return all the facet values in the search results with their counts. 2. You would probably have to join the names together with a special character and then split them later in the UI. 3. I'm sure there is a way to query the index for all

Re: DIH datasource configuration

2013-01-29 Thread Lapera-Valenzuela, Elizabeth [Primerica]
Is there a way to pass in password and user to datasource in db-config xml file? Thanks.

queryResultCache *very* low hit ratio

2013-01-29 Thread Petersen, Robert
Hi solr users, My queryResultCache hitratio has been trending down lately and is now at 0.01%, and also it's warmup time was almost a minute. I have lowered the autowarm count dramatically since there are no hits anyway. I also wanted to lower my autowarm counts across the board because I am

replicateOnStartup not finding commits after SOLR-3911?

2013-01-29 Thread Gregg Donovan
In the process of upgrading to 4.1 from 3.6, I've noticed that our master servers do not show any commit points available until after a new commit happens. So, for static indexes, replication doesn't happen and for dynamic indexes, we have to wait until an incremental update of master for slaves

Re: DIH datasource configuration

2013-01-29 Thread Gora Mohanty
On 30 January 2013 01:52, Lapera-Valenzuela, Elizabeth [Primerica] elizabeth.lap...@primerica.com wrote: Is there a way to pass in password and user to datasource in db-config xml file? Thanks. Do you mean something beyond what is covered in the Solr DIH Wiki page:

Re: replicateOnStartup not finding commits after SOLR-3911?

2013-01-29 Thread Mark Miller
On Jan 29, 2013, at 3:50 PM, Gregg Donovan gregg...@gmail.com wrote: should we just try uncommenting that line in ReplicationHandler? Please try. I'd file a JIRA issue in any case. I can probably take a closer look. - Mark

Re: Multiple-fields multilingual indexing - Query expansion for multilingual fields

2013-01-29 Thread Eduard Moraru
Hi Alex, On Wed, Jan 23, 2013 at 7:47 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: On Wed, Jan 23, 2013 at 12:23 PM, Eduard Moraru enygma2...@gmail.com wrote: The only small but workable problem I have now is the same as https://issues.apache.org/jira/browse/SOLR-3598. When you are

Traditional replication behind SolrCloud

2013-01-29 Thread Mingfeng Yang
Our application of Solr is somehow non-typical. We constantly feed Solr with lots of documents grabbed from internet, and NRT searching is not required. A typical search will return millions of result, and query response need to be as fast as possible. Since in SolrCloud environment, indexing

Re: Solr Data Config Queries per Field

2013-01-29 Thread O. Olson
Gora Mohanty-3 wrote Yes, things should function as you describe, and no you should not need any change in your schema from changing the DIH configuration file. Please take a look at http://wiki.apache.org/solr/SolrFacetingOverview#Facet_Indexing for how best to define faceting fields. Also,

Re: indexing Text file in solr

2013-01-29 Thread Jan Høydahl
If you're lucky, the file has a format suitable for the CSV update handler http://wiki.apache.org/solr/UpdateCSV Note that if your file does not containt unique ID, you can generate those http://wiki.apache.org/solr/UniqueKey You can then use CURL or http://wiki.apache.org/solr/post.jar to index

Re: In DIH, does column vs. name depend on data source?

2013-01-29 Thread Alexandre Rafalovitch
And I have just confirmed that this is indeed the case (unless I lost my mind). This must be causing great confusing to anybody trying to piece together examples from multiple places in DIH wiki and getting completely confused. The example I had just now was: 1) I had this in

Re: Multiple-fields multilingual indexing - Query expansion for multilingual fields

2013-01-29 Thread Alexandre Rafalovitch
On Tue, Jan 29, 2013 at 4:39 PM, Eduard Moraru enygma2...@gmail.com wrote: Now, what worries me a bit is the fact that I have a copyField set up from title_* to title_ml to do what I have mentioned above. copyField is not recursive, nor chained. Even if some people wished it was (chained).

RE: Solr Faceting with Name Values

2013-01-29 Thread O. Olson
Thank you Robi for the information. I will be looking into this esp. the implementation. Having to join the names together and then split them later is something I have to discuss with my team. O. O. Petersen, Robert wrote Hi O.O 1. Yes faceting on field function_s would return all the

Re: queryResultCache *very* low hit ratio

2013-01-29 Thread Shawn Heisey
On 1/29/2013 1:36 PM, Petersen, Robert wrote: My queryResultCache hitratio has been trending down lately and is now at 0.01%, and also it's warmup time was almost a minute. I have lowered the autowarm count dramatically since there are no hits anyway. I also wanted to lower my autowarm

Re: queryResultCache *very* low hit ratio

2013-01-29 Thread Yonik Seeley
One other thing that some auto-warming of the query result cache can achieve is loading FieldCache entries for sorting / function queries so real user queries don't experience increased latency. If you remove all auto-warming of the query result cache, you may want to add static warming entries

Re: replicateOnStartup not finding commits after SOLR-3911?

2013-01-29 Thread Gregg Donovan
Thanks, Mark -- that fixed the issue for us. I created https://issues.apache.org/jira/browse/SOLR-4380 to track it. On Tue, Jan 29, 2013 at 4:06 PM, Mark Miller markrmil...@gmail.com wrote: On Jan 29, 2013, at 3:50 PM, Gregg Donovan gregg...@gmail.com wrote: should we just try uncommenting

RE: queryResultCache *very* low hit ratio

2013-01-29 Thread Petersen, Robert
Thanks Yonik, I'm cooking up some static warming queries right now, based upon our commonly issued queries. I've already been noticing occasional long running queries. Our web farm times out a search after twenty seconds and issues an exception. I see a few of these every day and am trying

RE: queryResultCache *very* low hit ratio

2013-01-29 Thread Petersen, Robert
Hi Shawn, Since my solr services power product search for a large retail web site with over fourteen million unique products, so I'm suspecting the main reason for the low hit rate is many unique user queries. We're expanding our product count and product type categories every day as fast as

Re: How to migrate SolrCloud shards to different servers?

2013-01-29 Thread Mingfeng Yang
An experiment found that stop all shards, remove the zoo_data (assume your zookeeper is used for this particular solrcloud, otherwise, be cautious), and then start instance by order works fine. Ming On Sat, Jan 26, 2013 at 5:31 AM, Per Steffensen st...@designware.dk wrote: Hi We have

Re: edismax, qf, multiterm analyzer bug?

2013-01-29 Thread Ahmet Arslan
Looks like a bug to me. Thanks Jack for the reply. I created SOLR-4382 for this.

Re: How to migrate SolrCloud shards to different servers?

2013-01-29 Thread Timothy Potter
Just one suggestion, instead of stopping zk and removing zoo_data, better to use Solr's zkcli.sh script from cloud-scripts to clear out data, e.g. zkcli.sh -zkhost localhost:9983 -cmd clear /solr The paths I clear when I want a full clean-up are: /configs/CONFIG_NAME

Re: small QTime but slow results to user

2013-01-29 Thread S L
I'm just writing to close the loop on this issue. I moved my servlet to a beefier server with lots of RAM. I also cleaned up the data to make the index somewhat smaller. And, I turned off all the caches since my application doesn't benefit very much from caching. My application is now quite

Using FieldCache in SolrIndexSearcher for distributed id retrieval

2013-01-29 Thread Michael Ryan
Following up from a post I made back in 2011... I am a user of Solr 3.2 and I make use of the distributed search capabilities of Solr using a fairly simple architecture of a coordinator + some shards. Correct me if I am wrong: In a standard distributed search with QueryComponent, the