Blogpost about SOLR at Issuu

2013-02-13 Thread Martin Koch
Hi list I have written a blog post about the use of SOLR for searching at Issuuhttp://www.issuu.com . To give you a sense of the scale, Issuu indexes more than 9 million documents and 200 million pages. In January Issuu had 4.3 billion pageviews and over 125.8 million visits (60.1 unique). You

Re: DIH Delete with Full Import

2013-02-13 Thread Ahmet Arslan
define something like postImportDeleteQuery = Select Id from delete_log_table. Can someone provide me an example ? postImportDeleteQuery and preImportDeleteQuery queries are lucene/solr queries. For example I am using the following: preImportDeleteQuery=document_type:(photo OR news OR video

Exception while using CloudServer

2013-02-13 Thread J Mohamed Zahoor
Hi I was trying to connect to solr cloud using CloudServer, I get the following exception. I tried clearing the zookeeper state and then restarting the solr instances, still i get the same exception. am i missing something? org.apache.solr.common.cloud.ZkStateReader: Updating cluster state

Re: Exception while using CloudServer

2013-02-13 Thread J Mohamed Zahoor
I am using Solr 4.0. ./zahoor On 13-Feb-2013, at 3:56 PM, J Mohamed Zahoor zah...@indix.com wrote: Hi I was trying to connect to solr cloud using CloudServer, I get the following exception. I tried clearing the zookeeper state and then restarting the solr instances, still i get the

Re: Exception while using CloudServer

2013-02-13 Thread J Mohamed Zahoor
Hi I think the router:compositeId value inside the cluster state is creating this problem. ./Zahoor On 13-Feb-2013, at 4:06 PM, J Mohamed Zahoor zah...@indix.com wrote: I am using Solr 4.0. ./zahoor On 13-Feb-2013, at 3:56 PM, J Mohamed Zahoor zah...@indix.com wrote: Hi I

Re: Exception while using CloudServer

2013-02-13 Thread J Mohamed Zahoor
Apologies... I was using 4.1 in solr server and 4.0 in solrj client which caused this problem. ./zahoor On 13-Feb-2013, at 4:08 PM, J Mohamed Zahoor zah...@indix.com wrote: Hi I think the router:compositeId value inside the cluster state is creating this problem. ./Zahoor

Send Input Through Json into solr

2013-02-13 Thread anurag.jain
hey, I want to send query input through json file do not want to give query parameter. so is there any way to send. Like if i give query parameter it give response and in response there is a key call as parameter. so if i send that parameter through json. it will easy for me. let say input

Re: Send Input Through Json into solr

2013-02-13 Thread Sebastian Saip
I'm not sure if I understood you.. You want to send a request like http://localhost/solr/select? q=*:*wt=jsonstart=0fq=course_id:18 and get back only parts of the response for further processing? Then the easiest way is to retrieve the whole json and post-process only responseHeader.params. BR

SOLR and phrase offsets.

2013-02-13 Thread Vitaly_Artemov
Hi All, I try to understand how can I get phrase offsets as a result of search using SOLRJ client. I have only one field: field name=contents type=text_general indexed=true stored=false termVectors=true termPositions=true termOffsets=true / I don't want to save the field content in the index.

Re: Send Input Through Json into solr

2013-02-13 Thread Sebastian Saip
Ok, I see - you want to send a JSON Object which contains the query parameters. As far as I know, that's not possible out-of-the-box, so you'll have to create a custom SearchHandler http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/handler/component/SearchHandler.html for that. In

Why SolrInputDocument use a LinkedHashMap

2013-02-13 Thread knort
Programming some tests I found that two SolrInputDocuments with the same fields and values are different. Trying to figure it out why it's happening I found that the SolrInputDocument class use a LinkedHashMap. Is the insert order of the fields important for Solr? Thank you -- View this

Combining Solr score with customized user ratings for a document

2013-02-13 Thread Á_____o
Hi: I am working on a proyect where we want to recommend our users products based on their previous 'likes', purchases and so on (typical stuff of a recommender system), while we want to let them browse freely the catalogue by search queries, making use of facets, more-like-this and so on

Re: Why SolrInputDocument use a LinkedHashMap

2013-02-13 Thread Andre Bois-Crettez
Maybe it is more about having fast iterations even on a large collection of fields ? André On 02/13/2013 12:43 PM, knort wrote: Programming some tests I found that two SolrInputDocuments with the same fields and values are different. Trying to figure it out why it's happening I found that the

RE: Search over dynamic fields

2013-02-13 Thread Pragyanshis Pattanaik
Formatted the mail again. Hi, I have two dynamic fields like Product-Name-* and Product-Rating-*.One document can contain 5 products and respective ratings like below. str name=Product-Name-0HTC Wildfire S/strstr name=Product-Name-1Samsung Tab 2/strstr name=Product-Name-2Samsung Note/strstr

What should focus be on hardware for solr servers?

2013-02-13 Thread Matthew Shapiro
We are beginning talks with our IT department and management about switching from the google search appliance to Solr. One thing that we need to figure out is what kind of hardware we are going to require to host the Solr systems. What type of hardware (at a high level) should I be looking for.

Index-time synonyms and trailing wildcard issue

2013-02-13 Thread Johannes Rodenwald
Hi, I use Solr 3.6.0 with a synonym filter as the last filter at index time, using a list of stemmed terms. When i do a wildcard search that matches a part of an entry on the synonym list, the synonyms found are used by solr to generate the search results. I am trying to disable that

Most common query

2013-02-13 Thread ROSENBERG, YOEL (YOEL)** CTR **
Hi, I have a question, hope you can help me. I would like to get report using the solr admin tools that return the entire search that made on the system between dates. What is the correct way to do it? BR, Yoel [cid:image001.jpg@01CE0A0F.77B4D510] Yoel Rosenberg ALCATEL-LUCENT Support Engineer

Re: Index-time synonyms and trailing wildcard issue

2013-02-13 Thread Jack Krupansky
By doing synonyms at index time, you cause apfelsin to be added to documents that contain only orang, so of course documents that previously only contained orang will now match for apfelsin or any term query that matches apfelsin, such as a wildcard. At query time, Lucene cannot tell whether

Re: Possible issue in edismax?

2013-02-13 Thread Felipe Lahti
Cool that it worked :) I had this same problem in my project a few months ago On Tue, Feb 12, 2013 at 12:57 PM, Sandeep Mestry sanmes...@gmail.comwrote: Hi Felipe, Just a short note to say thanks for your valuable suggestion. I had implemented that and could see expected results. The length

RE: What should focus be on hardware for solr servers?

2013-02-13 Thread Toke Eskildsen
Matthew Shapiro [m...@mshapiro.net] wrote: [Hardware for Solr] What type of hardware (at a high level) should I be looking for. Are the main constraints disk I/O, memory size, processing power, etc...? That depends on what you are trying to achieve. Broadly speaking, simple search and

Re: Blogpost about SOLR at Issuu

2013-02-13 Thread Andre Bois-Crettez
Thanks, very interesting. The admin interface is very useful (although it would be useful with a sample admin-extras.html file somewhere - where it should go and what can go in it would be good to know. Right now, all we get is an exception in the logs about the file not existing). You only

Re: Solr 4.1.0 not using solrcore.properties ?

2013-02-13 Thread Daniel Rijkhof
I am looking at the source code of 4.1.0 and I cannot find any prove that solr 4.1.0's DIH would actually use any properties from the solrcore.properties file. I do however found that Solr does load my solrcore.properties file... It's strange that this would have been changed, Does anybody have

Re: What should focus be on hardware for solr servers?

2013-02-13 Thread Matthew Shapiro
Thanks for the reply. If the main amount of searches are the exact same (e.g. the empty search), the result will be cached. If 5,683 searches/month is the real count, this sounds like a very low amount of searches in a very limited corpus. Just about any machine should be fine. I guess I am

Re: What should focus be on hardware for solr servers?

2013-02-13 Thread Michael Della Bitta
Matthew, With an index that small, you should be able to build a proof of concept on your own hardware and discover how it performs using something like SolrMeter: Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY

Re: What should focus be on hardware for solr servers?

2013-02-13 Thread Michael Della Bitta
Ooops: https://code.google.com/p/solrmeter/ Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Wed, Feb 13, 2013 at 12:25 PM, Michael Della Bitta

Re: What should focus be on hardware for solr servers?

2013-02-13 Thread Matthew Shapiro
That definitely will be a useful tool in this conversion, thanks. On Wed, Feb 13, 2013 at 12:25 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Ooops: https://code.google.com/p/solrmeter/ Michael Della Bitta Appinions 18

Re: Blogpost about SOLR at Issuu

2013-02-13 Thread Shawn Heisey
On 2/13/2013 9:52 AM, Andre Bois-Crettez wrote: Thanks, very interesting. The admin interface is very useful (although it would be useful with a sample admin-extras.html file somewhere - where it should go and what can go in it would be good to know. Right now, all we get is an exception in the

RE: Solr 4.1.0 not using solrcore.properties ?

2013-02-13 Thread Dyer, James
The code that resolves variables in DIH was refactored extensively in 4.1.0. So if you've got a case where it does not resolve the variables properly, please give the details. We can open a JIRA issue and get this fixed. James Dyer Ingram Content Group (615) 213-4311 -Original

RE: Solr load balancer

2013-02-13 Thread Phil Hoy
Hi, I have opened a couple of jira's, one to make the HttpShardHandlerFactory and LBHttpSolrServer more easily extended: https://issues.apache.org/jira/browse/SOLR-4448 and one with an implementation of a backup requesting load balancer : https://issues.apache.org/jira/browse/SOLR-4449 . The

Re: Boost Specific Phrase

2013-02-13 Thread Amit Nithian
Have you looked at the pf parameter for dismax handlers? pf does I think what you are looking for which is to boost documents with the query term exactly matching in the various fields with some phrase slop. On Wed, Feb 13, 2013 at 2:59 AM, Hemant Verma hemantverm...@gmail.comwrote: Hi All I

Re: what do you use for testing relevance?

2013-02-13 Thread Amit Nithian
Ultimately this is dependent on what your metrics for success are. For some places it may be just raw CTR (did my click through rate increase) but for other places it may be a function of money (either it may be gross revenue, profits, # items sold etc). I don't know if there is a generic answer

Re: replication problems with solr4.1

2013-02-13 Thread Amit Nithian
So just a hunch... but when the slave downloads the data from the master, doesn't it do a commit to force solr to recognize the changes? In so doing, wouldn't that increase the generation number? In theory it shouldn't matter because the replication looks for files that are different to determine

Re: SolrCloud and hardcoded 'id' field

2013-02-13 Thread Mark Miller
A search for id is much too broad. I looked at 3 of the SolrCloud classes you mention and none of those id's have anything to do with the unique field in the schema. I have not looked at the hash based router, but if you find a real issue then please file a JIRA issue. - Mark On Feb 12,

Re: replication problems with solr4.1

2013-02-13 Thread Mark Miller
On Feb 13, 2013, at 1:17 PM, Amit Nithian anith...@gmail.com wrote: doesn't it do a commit to force solr to recognize the changes? yes. - Mark

Re: SolrCloud and hardcoded 'id' field

2013-02-13 Thread Mark Miller
Ah, you mention most of the SolrCloud ones don't look like a problem. The other two then: 1. RealTimeGetComponent - doesn't look like a schema field usage, don't see a problem. 2. HashBasedRouter - looks like it could be a problem and this is new for 4.1 - this is something we should document

SolrCloud : $SOLR_HOME/solr.xml

2013-02-13 Thread Anirudha Jadhav
is there a strong reason why we still need solr.xml on disk and it cannot be persisted and used from in zookeeper ? thanks, -- Anirudha P. Jadhav

Re: Why SolrInputDocument use a LinkedHashMap

2013-02-13 Thread Chris Hostetter
: Is the insert order of the fields important for Solr? svn blame can frequently be useful for understanding why specific choices were made... http://svn.apache.org/viewvc?view=revisionrevision=604951 https://issues.apache.org/jira/browse/SOLR-439 ...nut shell: it may not matter to you what

Re: SolrCloud : $SOLR_HOME/solr.xml

2013-02-13 Thread Mark Miller
Yes, though the reasons are not so interesting. Soon solr.xml is going away regardless - perhaps in a another release or two. - mark On Feb 13, 2013, at 2:02 PM, Anirudha Jadhav aniru...@nyu.edu wrote: is there a strong reason why we still need solr.xml on disk and it cannot be persisted and

RE: What should focus be on hardware for solr servers?

2013-02-13 Thread Toke Eskildsen
Matthew Shapiro [m...@mshapiro.net] wrote: Sorry, I should clarify our current statistics. First of all I meant 183k documents (not 183, woops). Around 100k of those are full fledged html articles (not web pages but articles in our CMS with html content inside of them), If an article is

Re: Facet maxcount?

2013-02-13 Thread Chris Hostetter
there's an open feature request about this, but part of the problem is that it's extremely hard to implement something like this efficiently in a distributed query... https://issues.apache.org/jira/browse/SOLR-1712 : Date: Wed, 6 Feb 2013 20:03:16 -0800 : From: Neelesh

Re: suggest only from certain documents

2013-02-13 Thread Chris Hostetter
: suggester simply looks at the terms in the index and returns some of them, : it's not aware (that I know of) of which docs the terms came from, so I I'm not certain, but isn't there where the spellcheck.collate option can be used?

Re: replication problems with solr4.1

2013-02-13 Thread Amit Nithian
Okay so then that should explain the generation difference of 1 between the master and slave On Wed, Feb 13, 2013 at 10:26 AM, Mark Miller markrmil...@gmail.com wrote: On Feb 13, 2013, at 1:17 PM, Amit Nithian anith...@gmail.com wrote: doesn't it do a commit to force solr to recognize the

Re: auto-complete with typo fuzzy suggests

2013-02-13 Thread Jack Krupansky
Try the spellchecker rather than the suggester/auto-complete: http://wiki.apache.org/solr/SpellCheckComponent -- Jack Krupansky -Original Message- From: ALEX PKB Sent: Wednesday, February 13, 2013 2:34 PM To: solr-user@lucene.apache.org Subject: auto-complete with typo fuzzy

Re: DIH Delete with Full Import

2013-02-13 Thread Kiran J
Thank you Ahmet. I figured it out. I had to define a separate entity which takes care of deletes. entity name=DeleteEntity query= SELECT ID AS [$deleteDocById] FROM Log WHERE '${dataimporter.request.clean}' = 'false' AND Log_Date = '${dataimporter.last_index_time}'

RE: suggest only from certain documents

2013-02-13 Thread Dyer, James
The key to get this working is to set spellcheck.maxCollationTries 0. It will generate collations even if there is only 1 term. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Wednesday, February 13,

Re: what do you use for testing relevance?

2013-02-13 Thread Roman Chyla
All, Thank you for your comments and links, I will explore them. I think that many people are facing similar questions - when they tune their search engines. Especially in Solr/Lucene community. While the requirements will be different, ultimately it is what they can do w lucene/solr that guides

Re: What should focus be on hardware for solr servers?

2013-02-13 Thread Matthew Shapiro
Excellent, thank you very much for the reply! On Wed, Feb 13, 2013 at 2:08 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote: Matthew Shapiro [m...@mshapiro.net] wrote: Sorry, I should clarify our current statistics. First of all I meant 183k documents (not 183, woops). Around 100k of

Re: Minimum word length for stemming

2013-02-13 Thread Chris Hostetter
: Thanks for confirming my suspicions, the custom : TokenLengthMarkerFilterFactory sounds like the best approach for doing this. that sounds like something that could be generally useful to lots of people ... by all means please open a jira issue and attach whatever you come up with for

Re: Solr 4.1.0 not using solrcore.properties ?

2013-02-13 Thread Daniel Rijkhof
James, I debugged it until I found where things go 'wrong'. Apparently the current implementation VariableResolver does not allow the use of a period '.' in any variable/property key you want to use... It's reserved for namespaces. Personally I would really love to use a period in my

RE: Can't determine Sort Order: 'prijs ASC', pos=5

2013-02-13 Thread Michael Ryan
I think the order needs to be in lowercase. Try asc instead of ASC. -Michael -Original Message- From: PeterKerk [mailto:vettepa...@hotmail.com] Sent: Wednesday, February 13, 2013 7:30 PM To: solr-user@lucene.apache.org Subject: Can't determine Sort Order: 'prijs ASC', pos=5 On this

RE: Search over dynamic fields

2013-02-13 Thread Pragyanshis Pattanaik
Or is there a way to achieve this using EDismax query parser ? From: pragyans...@outlook.com To: solr-user@lucene.apache.org Subject: RE: Search over dynamic fields Date: Wed, 13 Feb 2013 19:09:24 +0530 Formatted the mail again. Hi, I have two dynamic fields like Product-Name-* and

Re: Boost Specific Phrase

2013-02-13 Thread Amit Nithian
Ah yes sorry mis-understood. Another option is to use n-grams so that projectmanager is a term so any query involving project manager in india with 2 years experience would match higher because the query would contain projectmanager as a term. On Wed, Feb 13, 2013 at 9:56 PM, Hemant Verma

Re: Why a phrase is getting searched against default fields in solr

2013-02-13 Thread Ahmet Arslan
Hi Pragyanshis, What happens when you remove bq parameter? --- On Thu, 2/14/13, Pragyanshis Pattanaik pragyans...@outlook.com wrote: From: Pragyanshis Pattanaik pragyans...@outlook.com Subject: Why a phrase is getting searched against default fields in solr To: solr Forum

Re: which analyzer is used for facet.query?

2013-02-13 Thread Tommaso Teofili
I agree that's definitely strange, I'll have a look at it. Tommaso 2013/2/12 Chris Hostetter hossman_luc...@fucit.org : So it seems that facet.query is using the analyzer of type index. : Is it a bug or is there another analyzer type for the facet query? That doesn't really make any

Re: replication problems with solr4.1

2013-02-13 Thread Bernd Fehling
OK then index generation and index version are out of count when it comes to verify that master and slave index are in sync. What else is possible? The strange thing is if master is 2 or more generations ahead of slave then it works! With your logic the slave must _always_ be one generation