Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Aleksander Akerø
Tried the following config for setting the autoGeneratePhraseQueries but it didn't seem to change anything. Tested both true and false. fieldType name=keyword class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=

Lucene Join

2014-01-30 Thread anand chandak
Hi, I am trying to find whether the lucene joins (not solr join) if they are using any filter cache. The API that lucene uses is for joining joinutil.createjoinquery(), where can I find the source code for this API. Thanks in advance Thanks, Anand

ant eclipse hangs - branch_4x

2014-01-30 Thread Per Steffensen
Hi Earlier in used to be able to successfully run ant eclipse from branch_4x. With the newest code (tip of branch_4x today) I cant. ant eclipse hangs forever at the point showed by console output below. I noticed that this problem has been around for a while - not something that happened

Re: Lucene Join

2014-01-30 Thread Michael McCandless
Look in lucene's join module? Mike McCandless http://blog.mikemccandless.com On Thu, Jan 30, 2014 at 4:15 AM, anand chandak anand.chan...@oracle.com wrote: Hi, I am trying to find whether the lucene joins (not solr join) if they are using any filter cache. The API that lucene uses is for

Concurrency handling in DataImportHandler

2014-01-30 Thread Dileepa Jayakody
Hi All, Can I please know about how concurrency is handled in the DIH? What happens if multiple /dataimport requests are issued to the same Datasource? I'm doing some custom processing at the end of dataimport process as an EventListener configured in the data-config.xml as below. document

Re: Concurrency handling in DataImportHandler

2014-01-30 Thread Dileepa Jayakody
I would particularly like to know how DIH handles concurrency in JDBC database connections during datamport.. dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/solrtest user=usr1 password=123 batchSize=1 / Thanks, Dileepa On Thu, Jan 30, 2014 at 4:05

Re: Concurrency handling in DataImportHandler

2014-01-30 Thread Dileepa Jayakody
Hi All, I triggered a /dataimport for first 100 rows from my database and while it's running issued another import request for rows 101-200. In my log I see below exception; It seems multiple JDBC connections cannot be opened. Does this mean concurrency is not supported in DIH for JDBC

Re: Use a field without predefining it it the schema

2014-01-30 Thread Hakim Benoudjit
Thanks, That's a good feature since I dont have to reindex the whole data, nor to restart solr app. 2014-01-30 Steve Rowe sar...@gmail.com Hakim, All the fields you have added manually to the schema will be kept when you switch to using managed schema. From the managed schema page on the

Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Srinivasa7
Hi, I have similar kind of problem where I want search for a words with spaces in that. And I wanted to search by stripping all the spaces . I have used following schema for that fieldType name=nospaces class=solr.TextField autoGeneratePhraseQueries=true analyzer type=index

Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Srinivasa7
Aleksander Akerø It would be great if you can share the solution how you are handling it on field basis -- View this message in context: http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-trouble-with-exact-matches-tp4114193p4114435.html Sent from the Solr - User mailing list archive

Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Jack Krupansky
The standard, keyword-oriented query parsers will all treat unquoted, unescaped white space as term delimiters and ignore the what space. There is no way to bypass that behavior. So, your regex will never even see the white space - unless you enclose the text and white space in quotes or use a

Re: ant eclipse hangs - branch_4x

2014-01-30 Thread Steve Rowe
Hi Per, You may be seeing the stale-Ivy-lock problem (see IVY-1388). LUCENE-4636 upgraded the bootstrapped Ivy to 2.3.0 to reduce the likelihood of this problem, so the first thing is to make sure you have that version in ~/.ant/lib/ - if not, remove the Ivy jar that’s there and run ‘ant

Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Aleksander Akerø
Hi Srinivasa Yes I've come to understand that the analyzers will never see the whitespace, thus no need for patternreplacement, like Jack points out. So the solution would be to set wich parser to use for the query. Also Jack has pointed out that the field queryparser should work in this

Re: Not finding part of fulltext field when word ends in dot

2014-01-30 Thread Jack Krupansky
The word delimiter filter will turn 26KA into two tokens, as if you had written 26 KA without the quotes. The autoGeneratePhraseQueries option will cause the multiple terms to be treated as if they actually were enclosed within quotes, otherwise they will be treated as separate and unquoted

Re: how to write an efficient query with a subquery to restrict the search space?

2014-01-30 Thread Jack Krupansky
Lucene's default scoring should give you much of what you want - ranking hits of low-frequency terms higher - without any special query syntax - just list out your terms and use OR as your default operator. -- Jack Krupansky -Original Message- From: svante karlsson Sent: Thursday,

SOLR suggester with highlighting

2014-01-30 Thread Jorge Sanchez
Hello, I am trying to make a typehead autocomplete with SOLR using the suggester. The search will be done for users and group names which aggregate users. The search will be done on usernames , bio , web page and other stuff. What I want to achieve is sort of facebook or twitter alike search.

Re: Solr middle-ware?

2014-01-30 Thread Jack Krupansky
It would be great if an example were available as part of the Solr release. Please file a Jira request. Maybe this could be one of the GSOC (Google Summer of Code) projects, or maybe somebody/everybody could submit their search middleware code as possible examples, attached to the Jira, so that

Re: Regarding Solr Faceting on the query response.

2014-01-30 Thread Alexei Martchenko
I believe its not possible to facet only the page you are, facet is supposed to work only with the full resultset. I never tried but i've never seen a way this could be done. alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko|

Re: Solr middle-ware?

2014-01-30 Thread Furkan KAMACI
Hi; If you need such kind of thing and if you/we can define the requirements I can contribute to Solr as a part of GSOC. Thanks; Furkan KAMACI 2014-01-30 Jack Krupansky j...@basetechnology.com: It would be great if an example were available as part of the Solr release. Please file a Jira

Re: high memory usage with small data set

2014-01-30 Thread Erick Erickson
Do your used entries in your caches increase in parallel? This would be the case if you aren't updating your index and would explain it. BTW, take a look at your cache statistics (from the admin page) and look at the cache hit ratios. If they are very small (and my guess is that with 1,500 boolean

Re: 4.6 Core Discovery coreRootDirectory not working

2014-01-30 Thread Erick Erickson
I'm traveling and can't pursue this right now, but a couple of questions: /home/user1/solr/core.properties exists in all these cases, right? Tangential, but I'd be very cautious about setting core root the way you are, since it'll walk each and every directory under /home looking for cores.

Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Erick Erickson
Note, the comments about lowercasetokenizer were a red herring. You were using LowerCaseFilterFactory. note Filter rather than Tokenizer. So it would just do what you expected, lowercase the entire input. You would have used LowerCaseTokenizerFactory in place of KeywordTokenizerFactory, not as a

Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Aleksander Akerø
Yes, I actually noted that about the filter vs. tokenizer. It's easy to get confused if you don't have a good understanding of the differences between tokenizers and filters. As for the query parser problem, there's always a workaround, but it was nice to be made aware of. It sort of was a

SolR performance problem

2014-01-30 Thread MayurPanchal
Hi, I am working on solr 4.2.1 jetty and we are facing some performance issue and heap memory overflow issue as well. So i am searching the actual cause for this exceptions. then i applied load test for different solr queries. After few mins got below errors. WARN:oejs.Response:Committed

Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Jack Krupansky
I vaguely recall that there was a Jira floating around for multi-word synonyms that dealt with parsing of spaces as well. And Robert Muir has (repeatedly) referred to this query parser feature as a bug. Somehow, eventually, I think it will be dealt with, but the difficulty remains for now.

Re: ant eclipse hangs - branch_4x

2014-01-30 Thread Per Steffensen
Hi I used Ivy 2.2.0. Upgraded to 2.3.0. Didnt help No lck files found in ~/.ivy2/cache, so nothing to delete Deleted the entire ~/.ivy2/cache folder. Didnt help Debugged a little and found that it was hanging due to org.apache.hadoop dependencies in solr/core/ivy.xml - if I commended out

Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Aleksander Akerø
I've come across something like this as well, can't remember where, but it was often related to synonym functionality. The following link shows a 3rd party QueryParser that seems to deal with synonyms alongside edismax, and may be interesting to look at: http://wiki.apache.org/solr/QueryParser

Error when restarting solr servers

2014-01-30 Thread lansing
Hello, Running solr cloud with 2 collections 5 shards and 3 replicas for each collection, 5 zookeeper instance. solr-4.6.0 apache-tomcat-7.0.39 zookeeper-3.4.5 jre1.7.0_21 When I try to restart a solr servers in my solr cloud I am receiving this errors : 1861449

Re: Required local configuration with ZK solr.xml?

2014-01-30 Thread Shawn Heisey
On 1/29/2014 12:48 PM, Jeff Wartes wrote: And that, I think, is my misunderstanding. I had assumed that the link between a node and the collections it belongs to would be the (possibly chroot¹ed) zookeeper reference *itself*, not the node¹s directory structure. Instead, it appears that ZK is

Re: Required local configuration with ZK solr.xml?

2014-01-30 Thread Jeff Wartes
Work is underway towards a new mode where zookeeper is the ultimate source of truth, and each node will behave accordingly to implement and maintain that truth. I can't seem to locate a Jira issue for it, unfortunately. It's possible that one doesn't exist yet, or that it has an obscure title.

Re: Regarding Solr Faceting on the query response.

2014-01-30 Thread Kuchekar
Hi Mikhail, I would like my faceting to run only on my resultset returned as in only on numFound, rather than the whole index. In the example, even when I specify the query 'company:Apple' .. it gives me faceted results for other companies. This means that it is querying against

RES: Regarding Solr Faceting on the query response.

2014-01-30 Thread Felipe Dantas de Souza Paiva
Hi Nilesh, maybe Facetting is not the right thing for you, because 'faceting is the arrangement of search results into categories based on indexed terms' (https://cwiki.apache.org/confluence/display/solr/Faceting). Perhaps you could use Result Clustering

Adding DocValues in an existing field

2014-01-30 Thread yriveiro
Hi, Can I add to an existing field the docvalue feature without wipe the actual? The modification on the schema will be something like this: field name=surrogate_id type=tlong indexed=true stored=true multiValued=false / field name=surrogate_id type=tlong indexed=true stored=true

Geospatial clustering + zoom in/out help

2014-01-30 Thread Bojan Šmid
Hi, I have an index with 300K docs with lat,lon. I need to cluster the docs based on lat,lon for display in the UI. The user then needs to be able to click on any cluster and zoom in (up to 11 levels deep). I'm using Solr 4.6 and I'm wondering how best to implement this efficiently? A bit more

Is there a way to get Solr to delete an uploaded document after its been indexed?

2014-01-30 Thread eShard
Hi, My crawler uploads all the documents to Solr for indexing to a tomcat/temp folder. Over time this folder grows so large that I run out of disk space. So, I wrote a bash script to delete the files and put it in the crontab. However, if I delete the docs too soon, it doesn't get indexed; too

Re: Required local configuration with ZK solr.xml?

2014-01-30 Thread Jeff Wartes
Found it. In case anyone else cares, this appears to be the root issue: https://issues.apache.org/jira/browse/SOLR-5128 Thanks again. On 1/30/14, 9:01 AM, Jeff Wartes jwar...@whitepages.com wrote: Work is underway towards a new mode where zookeeper is the ultimate source of truth, and each

JVM heap constraints and garbage collection

2014-01-30 Thread Joseph Hagerty
Greetings esteemed Solr-ites, I'm using Solr 3.5 over Tomcat 6. My index has reached 30G. Since my average load during peak hours is becoming quite high, and since I'm finally starting to notice a little bit of performance degradation and intermittent errors (e.g. Solr returned response 0 on

TemplateTransformer returns null values

2014-01-30 Thread tom
Hi, I am trying a simple transformer on data input using DIH, Solr 4.6. when I run the below query while DIH I get null values for new_url. what is wrong? even tried with ${document_solr.id} the name is data-config.xml: entity name=document_solr

Re: Is there a way to get Solr to delete an uploaded document after its been indexed?

2014-01-30 Thread Alexandre Rafalovitch
Well, it's your crawler that submits them, so the crawler should know when to delete them. If you want some sort of trigger from Solr, look at postCommit hook defined in solrconfig.xml. Though all that gives you is timing, not which documents to deal with. You could probably also plug into

Re: TemplateTransformer returns null values

2014-01-30 Thread Alexandre Rafalovitch
I think you have double mapping there: *) select DOC_IDN as id *) field column=DOC_IDN name=id / Both are mapping DOC_IDN to id, possibly with second overriding the first (or shadowing). Try not doing 'as' part in select and then look for .id . Or keep the 'as' part as just have explicit field

Re: TemplateTransformer returns null values

2014-01-30 Thread tom
Thanks Alexandre for quick response, I tried both the ways but still no luck null values, anything I am doing fundamentally wrong? query=select DOC_IDN, BILL_IDN from document_fact field column=DOC_IDN name=id / and query=select DOC_IDN as id ,BILL_IDN as bill_id

Re: Boosting documents by categorical preferences

2014-01-30 Thread Amit Nithian
Chris, Sounds good! Thanks for the tips.. I'll be glad to submit my talk to this as I have a writeup pretty much ready to go. Cheers Amit On Tue, Jan 28, 2014 at 11:24 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : The initial results seem to be kinda promising... of course there are

Re: JVM heap constraints and garbage collection

2014-01-30 Thread Shawn Heisey
On 1/30/2014 3:20 PM, Joseph Hagerty wrote: I'm using Solr 3.5 over Tomcat 6. My index has reached 30G. snip - The box is an m1.large on AWS EC2. 2 virtual CPUs, 4 ECU, 7.5 GiB RAM One detail that you did not provide was how much of your 7.5GB RAM you are allocating to the Java heap for

Re: TemplateTransformer returns null values

2014-01-30 Thread Alexandre Rafalovitch
Hmm, Try the variable reference without scope: ${id}. I can't remember if the scope is required only for higher level items. It might also be worth writing a very basic All fields logger to see what your in-progress map looks like. Regards, Alex. Personal website:

Re: Regarding Solr Faceting on the query response.

2014-01-30 Thread Alexandre Rafalovitch
Hi Nilesh, I am not sure the faceting code does what you think it does. However, there are different options and you can experiment with whichever one is best for you. They are controlled by the facet.method parameter: http://wiki.apache.org/solr/SimpleFacetParameters#facet.method Regards,