Re: Search and Entity structure

2012-10-26 Thread v vijith
Hi, Dear All, Apologize for lengthy email SOLR Version: 4 Im a newbie to SOLR and have gone through tutorial but could not get a solution. The below requirement doesnt seem to be impossible but I think Im missing the obvious. In my RDBMS, there is a Qualification table and an Employee

Re: Search and Entity structure

2012-10-26 Thread adityab
Hi Vijith, See if this solution solves your problem. There might be other ways this is the one i have on top of my mind at this hour. You might be having and ID for each qualification. then have the relation using dotted notation. 1 = MBA, 2 = LEAD etc. arr name=grade str1.A/str str2.B/str

Re: DIH nested entities don't work

2012-10-26 Thread mroosendaal
Hi, I tried giving the pdt_id from the subentity a specific value and it worked. Only now every product has the same value. I tried a different subentity with the construct subentity.pdt_id='${entity.pdt_id}' with the same result as above, all products had a songtitle with the same value. So

Re:Facet date/range + facet.mincount + distributed search issue

2012-10-26 Thread Dovao Jimenez, Oscar
Dear all, Using facet date/range on a date typed field and on a distributed search between schema compatible cores, the use of facet.mincount=1 brings a cut down number of facet values (over 500 facet values expected, 5 facet values retrieved). I wonder whether facet.mincount is supported on

[Solr boost] Date boost for certain query set

2012-10-26 Thread Alessandro Benedetti
Hi guys, I was fighting with boost factor in my edismax request handler : lst name=appends str name=defTypeedismax/str str name=bq(idente:2)^0.1/str str name=bfrecip(ms(NOW,data),3.16e-11,1,1)/str /lst I'm playing with bq( boost query) and bf (boost function). Is

Re: Occasional Solr performance issues

2012-10-26 Thread Dotan Cohen
On Wed, Oct 24, 2012 at 4:33 PM, Walter Underwood wun...@wunderwood.org wrote: Please consider never running optimize. That should be called force merge. Thanks. I have been letting the system run for about two days already without an optimize. I will let it run a week, then merge to see the

Index-time field boosting

2012-10-26 Thread vsl
Hi, I have a problem with index time boosting. I created 4 new fields: field name=alltext type=text_general indexed=true stored=false multiValued=true omitNorms=false/ field name=title type=text_general indexed=true stored=true multiValued=true boost=5.0 omitNorms=false/

how solr to boost term value at the start of ther field?

2012-10-26 Thread YooKyuseok
i am kyuseok in Republic of Korea. nice to meet you. i am searching about 'solr boost term postion' and struggling with this isusse. and i still didn`t solve it. i am serving video contents search server. and my client require the contents which match the value at the start of field shoud be

Filtering HTML content in Solr 4.0.0

2012-10-26 Thread Pratyul Kapoor
Hi, I am using Solr 4.0.0. I have a HTML content as description of a product. If I index it without any filtering it is giving errors on search. How can I filter an HTML content. Pratyul

Re: [Solr boost] Date boost for certain query set

2012-10-26 Thread Alessandro Benedetti
I've made some steps ahead. I'm writing a function to this sort of clustered boosting: str name=bfproduct(recip(ms(NOW,data),3.16e-11,1,1),exists(query(field:value)))/str I multiply the boost, for a specific value of some field, the exists function will return 0 or 1, and this would cancel or use

Re: Filtering HTML content in Solr 4.0.0

2012-10-26 Thread Rafał Kuć
Hello! You try to put the HTML into the XML sent to Solr right ? You should use the proper UTF-8 encoding to do that. For example look at the utf8-example.xml file from the exampledocs directory that comes with Solr and you'll see something like this: field name=featurestag with escaped chars:

Re: Filtering HTML content in Solr 4.0.0

2012-10-26 Thread Rogério Pereira Araújo
I think you will have to write an UpdateProcessor to strip out html tags. http://wiki.apache.org/solr/UpdateRequestProcessor As per Solr 4.0 you can also use scripting languages like Python, Ruby and Javascript to write scripts for use as updateprocessors too. -Mensagem Original-

RE: how solr to boost term value at the start of ther field?

2012-10-26 Thread Markus Jelsma
Hi, One trick is to index a special token at the beginning of the content and do a phrase query for your terms and the special token with little or no slop. You can also use Lucene's SpanFirstQuery but it's not yet exposed in Solr. There's a patch for trunk exposing the SpanFirstQuery in

Re: Filtering HTML content in Solr 4.0.0

2012-10-26 Thread Rafał Kuć
Hello! You don't need a custom update request processor - there is a char filter dedicated to strip HTML tags from your content and index only relevant parts of it - http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory However, you first need to properly

Re: Index-time field boosting

2012-10-26 Thread Otis Gospodnetic
Hi, Can you show us the configuration for your request handler from solrconfig.xml? Otis -- Performance Monitoring - http://sematext.com/spm On Oct 26, 2012 8:33 AM, vsl ociepa.pa...@gmail.com wrote: Hi, I have a problem with index time boosting. I created 4 new fields: field

Re: DIH update?

2012-10-26 Thread Billy Newman
Sorry, to be more specific I am referring to partial document update, which I believe is new to Solr 4. Also I am using a URLDataSource and I cannot use the delta-import feature, nor is it what I am looking for. I.E. DIH - creates: id: 12345 first: hello Runs again and pulls in the same doc

Re: Occasional Solr performance issues

2012-10-26 Thread Dotan Cohen
I spoke too soon! Wereas three days ago when the index was new 500 records could be written to it in 3 seconds, now that operation is taking a minute and a half, sometimes longer. I ran optimize() but that did not help the writes. What can I do to improve the write performance? Even opening the

Re: Index-time field boosting

2012-10-26 Thread vsl
Hi, this is my request handler from solrconfig.xml: requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str /lst /requestHandler -- View this message in context:

Re: DIH nested entities don't work

2012-10-26 Thread Gora Mohanty
On 26 October 2012 13:14, mroosendaal mroosend...@yahoo.com wrote: Hi, I tried giving the pdt_id from the subentity a specific value and it worked. Only now every product has the same value. [...] No offence, but it is difficult to try help you if you provide partial information, and keep

Re: Best way to commit data to Solr

2012-10-26 Thread Erick Erickson
Here's a great blog explaining one major difference between 3.x and 4.x: http://www.searchworkings.org/blog/-/blogs/gimme-all-resources-you-have-i-can-use-them!/ In a nutshell, 3.x blocks on segment merges (which can be triggered by commits). I've heard anecdotal accounts that pushing your

Problem with loading dictionary for Hunspell

2012-10-26 Thread Rob Koeling
I'm trying to employ the HunspellStemFilterFactory, but have trouble loading a dictionary. I downloaded the .dic and .aff file for en_GB, en_US and nl_NL from the OpenOffice site, but they all give me the same error message. When I use them AS IS, I get the error message: Oct 26, 2012 2:39:37

Re: DIH update?

2012-10-26 Thread Gora Mohanty
On 26 October 2012 18:45, Billy Newman newman...@gmail.com wrote: Sorry, to be more specific I am referring to partial document update, which I believe is new to Solr 4. Also I am using a URLDataSource and I cannot use the delta-import feature, nor is it what I am looking for. [...] Haven't

Re: Occasional Solr performance issues

2012-10-26 Thread Shawn Heisey
On 10/26/2012 7:16 AM, Dotan Cohen wrote: I spoke too soon! Wereas three days ago when the index was new 500 records could be written to it in 3 seconds, now that operation is taking a minute and a half, sometimes longer. I ran optimize() but that did not help the writes. What can I do to

Re: SolrCloud and distributed search

2012-10-26 Thread Bill Au
I am currently using one master with multiple slaves so I do have high availability for searching now. My index does fit on a single machine and a single query does not take too long to execute. But I do want to take advantage of high availability of indexing and real time replication. So it

Re: DIH nested entities don't work

2012-10-26 Thread mroosendaal
None taken:-) Here's the info OS: Linux Java: 1.6 Oracle driver: ojdbc14.jar *view structure:* END_FRG_PRODUCTS_VW pdt_id pdt_title stock_availability pdt_pce_bolprice gpc_id offer_type search_rank search_title

lukeall.jar for Solr4r?

2012-10-26 Thread Carrie Coy
Where can I get a copy of Luke capable of reading Solr4 indexes? My lukeall-4.0.0-ALPHA.jar no longer works. Thx, Carrie Coy

Re: SolrCloud and distributed search

2012-10-26 Thread Erick Erickson
Yes, I think SolrCloud makes sense with a single shard for exactly this reason, NRT and multiple replicas. I don't know how you'd get NRT on multiple machines without it. But do be aware of: https://issues.apache.org/jira/browse/SOLR-3971 A collection that is created with numShards=1 turns into a

Re: SolrCloud and distributed search

2012-10-26 Thread Tomás Fernández Löbbe
You should still use some kind of load balancer for searches, unless you use the CloudSolrServer (SolrJ) which includes the load balancing. Tomás On Fri, Oct 26, 2012 at 11:46 AM, Erick Erickson erickerick...@gmail.comwrote: Yes, I think SolrCloud makes sense with a single shard for exactly

Re: DIH nested entities don't work

2012-10-26 Thread Gora Mohanty
On 26 October 2012 20:01, mroosendaal mroosend...@yahoo.com wrote: None taken:-) Here's the info [...] The DIH configuration, and schema look fine. I have not used the new caching setup in 4.0 enough to comment on it. *what i;ve tried* * importing only the products -- worked * importing

Re: SolrCloud and distributed search

2012-10-26 Thread Bill Au
I am thinking of using a load balancer for both indexing and querying to spread both the indexing and querying load across all the machines. Bill On Fri, Oct 26, 2012 at 10:48 AM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: You should still use some kind of load balancer for searches,

Re: Occasional Solr performance issues

2012-10-26 Thread Dotan Cohen
On Fri, Oct 26, 2012 at 4:02 PM, Shawn Heisey s...@elyograg.org wrote: Taking all the information I've seen so far, my bet is on either cache warming or heap/GC trouble as the source of your problem. It's now specific information gathering time. Can you gather all the following information

Re: SolrCloud and distributed search

2012-10-26 Thread Tomás Fernández Löbbe
If you are going to use SolrJ, CloudSolrServer is even better than a round-robin load balancer for indexing, because it will send the documents straight to the shard leader (you save one internal request). If not, round-robin should be fine. Tomás On Fri, Oct 26, 2012 at 12:27 PM, Bill Au

Re: SolrCloud and distributed search

2012-10-26 Thread Yonik Seeley
On Fri, Oct 26, 2012 at 10:14 AM, Bill Au bill.w...@gmail.com wrote: I am currently using one master with multiple slaves so I do have high availability for searching now. My index does fit on a single machine and a single query does not take too long to execute. But I do want to take

Re: Search and Entity structure

2012-10-26 Thread v vijith
Thanks for the response. This workaround would be difficult to implement. Also Im finding it very difficult to understand that SOLR doesnt provide this feature for searching. On Fri, Oct 26, 2012 at 9:42 AM, adityab aditya_ba...@yahoo.com wrote: Hi Vijith, See if this solution solves your

Re: Search and Entity structure

2012-10-26 Thread Gora Mohanty
On 25 October 2012 23:48, v vijith vvij...@gmail.com wrote: Dear All, Apologize for lengthy email SOLR Version: 4 Im a newbie to SOLR and have gone through tutorial but could not get a solution. The below requirement doesnt seem to be impossible but I think Im missing the obvious.

Re: Search and Entity structure

2012-10-26 Thread v vijith
The schema content that I have put in is field name=EMPID type=integer indexed=true stored=true required=true multiValued=false / field name=empname type=string indexed=true stored=true / field name=gradeid type=integer indexed=true stored=true/ field name=gradename type=string

Re: Occasional Solr performance issues

2012-10-26 Thread Shawn Heisey
On 10/26/2012 9:41 AM, Dotan Cohen wrote: On the dashboard of the GUI, it lists all the jvm arguments. Include those. Click Java Properties and gather the java.runtime.version and java.specification.vendor information. After one of the long update times, pause/stop your indexing application.

Re: [Solr boost] Date boost for certain query set

2012-10-26 Thread Jack Krupansky
The exists function returns a boolean - which you can use in an if function: if(exists(query(field:value)),recip(ms(NOW,data),3.16e-11,1,1),0) See: http://wiki.apache.org/solr/FunctionQuery#exists -- Jack Krupansky -Original Message- From: Alessandro Benedetti Sent: Friday, October

Re: Index-time field boosting

2012-10-26 Thread Otis Gospodnetic
Hi, Have a look at http://search-lucene.com/?q=extendeddismax. Use qf param in edismax to assign weights. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Fri, Oct 26, 2012 at 9:19 AM, vsl

Re: Get metadata for query

2012-10-26 Thread Otis Gospodnetic
Hi, No... but you could simply query your index, get all the fields you need and process them to get what you need. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Fri, Oct 26, 2012 at 10:19 AM, Torben

edismax bq, ignore tf/idf?

2012-10-26 Thread Ryan McKinley
Hi- I am trying to add a setting that will boost results based on existence in different buckets. Using edismax, I added the bq parameter: location:A^5 location:B^3 I want this to put everything in location A above everything in location B. This mostly works, BUT depending on the number of

Re: edismax bq, ignore tf/idf?

2012-10-26 Thread Jack Krupansky
How about a boost function, bf or boost? bf=if(exists(query(location:A)),5,if(exists(query(location:B)),3,0)) Use bf if you want to add to the score, boost if you want to multiply the score -- Jack Krupansky -Original Message- From: Ryan McKinley Sent: Friday, October 26, 2012

Re: edismax bq, ignore tf/idf?

2012-10-26 Thread Chris Hostetter
: How about a boost function, bf or boost? : : bf=if(exists(query(location:A)),5,if(exists(query(location:B)),3,0)) Right ... assuming you only want to ignore tf/idf on these fields in this specifc context, function queries are the way to go -- otherwise you could just use a per-field

Re: SolrJ missing CollectionAdmin Api to create new collections dynamically

2012-10-26 Thread Chris Hostetter
: I can't find a good way to create a new Collection with SolrJ. : I need to create my Collections dynamically and at the moment the only way I : see is to call the CollectionAdmin with a HTTP Call directly to any of my : SolrServers. : : I don't like this because I think its a better way only

Re: DIH throws NullPointerException when using dataimporter.functions.escapeSql with parent entities

2012-10-26 Thread Lance Norskog
Which database rows cause the problem? The bug report talks about fields with an empty string. Do your rows have empty string values? - Original Message - | From: Dominik Siebel m...@dsiebel.de | To: solr-user@lucene.apache.org | Sent: Monday, October 22, 2012 3:15:29 AM | Subject: Re:

Re: edismax bq, ignore tf/idf?

2012-10-26 Thread Ryan McKinley
thanks! On Fri, Oct 26, 2012 at 4:20 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : How about a boost function, bf or boost? : : bf=if(exists(query(location:A)),5,if(exists(query(location:B)),3,0)) Right ... assuming you only want to ignore tf/idf on these fields in this specifc

Re: Index-time field boosting

2012-10-26 Thread Chris Hostetter
: I have a problem with index time boosting. I created 4 new fields: I think you are missunderstanding the meaning of index time boosting vs query time boosting. First of all, this is not meaninful syntax in your schema.xml... : field name=title type=text_general indexed=true

Re: Get metadata for query

2012-10-26 Thread Lance Norskog
Ah, there's the problem- what is a fast way to fetch all fields in a collection, including dynamic fields? - Original Message - | From: Otis Gospodnetic otis.gospodne...@gmail.com | To: solr-user@lucene.apache.org | Sent: Friday, October 26, 2012 3:05:04 PM | Subject: Re: Get metadata

Any way to by pass the checking on QueryElevationComponent

2012-10-26 Thread James Ji
Hi there We are currently working on having Solr files read from HDFS. We extended some of the classes so as to avoid modifying the original Solr code and make it compatible with the future release. So here comes the question, I found in QueryElevationComponent, there is a piece of code checking

Re: Search and Entity structure

2012-10-26 Thread Lance Norskog
A side point: in fact, the connection between MBA and grade is not lost. The values in a multi-valued field are stored in order. You can have separate multi-valued fields with matching entries, and the values will be fetched in order and you can match them by counting. This is not database-ish,

Re: Get metadata for query

2012-10-26 Thread Jack Krupansky
I'm not sure I understand the real question here. What is the metadata. I mean, q=xfl=* gives you all the (stored) fields for documents matching the query. What else is there? -- Jack Krupansky -Original Message- From: Lance Norskog Sent: Friday, October 26, 2012 9:42 PM To:

Re: SolrJ missing CollectionAdmin Api to create new collections dynamically

2012-10-26 Thread Markus.Mirsberger
Yes thanks. But how can I check the status of a collection? The action STATUS not exist in the CollectionAdmin, only in the CoreAdmin. At the moment probably the only way to get information about this is somehow through the ZkStateReader? Regards, Markus On 27.10.2012 06:37, Chris Hostetter

Re: Search and Entity structure

2012-10-26 Thread Gora Mohanty
On 27 October 2012 07:55, Lance Norskog goks...@gmail.com wrote: A side point: in fact, the connection between MBA and grade is not lost. The values in a multi-valued field are stored in order. You can have separate multi-valued fields with matching entries, and the values will be fetched in

Re: Search and Entity structure

2012-10-26 Thread Gora Mohanty
On 27 October 2012 01:20, v vijith vvij...@gmail.com wrote: [...] The dataconfig file is document entity name=employee query=select * from employee entity name=qualification query=select * from qualification where empid='${employee.EMPID}'/ /entity /document