Lucene 3.4.0 Merging

2011-09-30 Thread Ahson Iqbal
Hi I have 3 solr3.4.0 indexes i want to merge them, after searching on web i found that there are two ways to do it as 1. Using Lucene Merge tool. 2. Merging through core admin i am using the 1st method for this i have downloaded lucene 3.4.0 and unpack it and then run following command on

Re: Indexing geohash in solrj - Multivalued spatial search

2011-09-30 Thread Alessandro Benedetti
Ok this is a good news ! I will integrate the jar and test the feature :) Only one question regarding the indexing process in solrj. We could index the location data in format : lat,lon in the geohash field? Or we must encode lan lon in the geohash string and then index the encoded string? Thank

Re: how to implement search including special characters?

2011-09-30 Thread Ahmet Arslan
        in my application i need to implement search functionality with included special characters ex1:if i enter search term as (PAGE) i am getting results, if i enter search term as () i am getting error . ex2: eventhough if i enter {,},@,$,%.i need to get search results those

Weird issues when upgrading from 1.4 to 3.4

2011-09-30 Thread Willem Basson
Hi there We are currently upgrading from solr 1.4 to 3.4 and have seen some issues with our specific use-case. As background we drop the whole index and then add all our documents in one big build before we commit and then optimise. This way we can revert the build if there are any issues and

Suggestions on how to perform infrastructure migration from 1.4 to 3.4?

2011-09-30 Thread Pranav Prakash
Hi List, We have our production search infrastructure as - 1 indexing master, 2 serving identical twin slaves. They are all Solr 1.4 beasts. Apart from this we have 1 beast on Solr 3.4, which we have benchmarked against our production setup (against performance and relevancy) and would like to

How to skip current document to index data from DIP

2011-09-30 Thread scorpking
hi, can anyone help me this problem? I'm using tika to index data from rich documents and index by http request. I queried from database to get fields and then combined with Tika. everything is ok, but i face to face with this error FileNotFoundException. I known this error, but I want skip

heap size problem when indexinf files with solrj

2011-09-30 Thread hadi
I write a simple program with solrj that index files but after a minute passed it crashed and the *java.lang.OutOfmemoryError : java heap space* appear I used Eclipse and my memory storage is abou 2GB and i set the -Xms1024M-Xmx2048M for both my VM arg of tomcat and my application in Debug

Re: how to implement search including special characters?

2011-09-30 Thread nagarjuna
yaa iorixxx...i found that lucene qury parser class http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/queryParser/QueryParser.html here can u pls explain that how i have to use this class in my solr config files i already added the following classes in my schema.xml file filter

Re: SOLR Index Speed

2011-09-30 Thread Lord Khan Han
Any idea ? On Thu, Sep 29, 2011 at 1:53 PM, Lord Khan Han khanuniver...@gmail.comwrote: Hi, The no-op run completed in 20 minutes. The only commented line was solr.addBean(doc) We've tried SUSS as a drop in replacement for CommonsHttpSolrServer but it's behavior was weird. We have seen 10Ks

skipping parts of query analysis for some queries

2011-09-30 Thread Bernd Fehling
I'm in the need of skipping some query analysis steps for some queries. Or more precisely, make it switchable with a query parameter. Use case: fieldType name=text_spec class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index charFilter

Re: Weird issues when upgrading from 1.4 to 3.4

2011-09-30 Thread Willem Basson
Just to clarify, I'm not worried about the virtual memory getting bigger, the issue is that after doing a lot of adds without a commit the performance dramatically decreases until we do the commit. This didn't use to be a problem with 1.4 Willem On Fri, Sep 30, 2011 at 10:20 AM, Willem Basson

Difference b/w SimplepostTool code and posting the file using SOLRJ

2011-09-30 Thread kiran.bodigam
We can post the documents from command line by running the post.jar file and giving the list of files *.xml to the solr to index the document.Here we are posting the document xml documents which has some unique format i would like to know what are the advantages that i get from this format? adddoc

Re: Indexing geohash in solrj - Multivalued spatial search

2011-09-30 Thread Smiley, David W.
On Sep 30, 2011, at 4:14 AM, Alessandro Benedetti wrote: Ok this is a good news ! I will integrate the jar and test the feature :) Only one question regarding the indexing process in solrj. We could index the location data in format : lat,lon in the geohash field? Or we must encode lan lon

Re: how to implement search including special characters?

2011-09-30 Thread Ahmet Arslan
yaa iorixxx...i found that lucene qury parser class http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/queryParser/QueryParser.html here can u pls explain that how i have to use this class in my solr config files i already added the following classes in my schema.xml file

Re: Difference b/w SimplepostTool code and posting the file using SOLRJ

2011-09-30 Thread simon
Well, the Solr XML format is the only format which Solr's XML Update handler knows about, and it's been baked into Solr from the beginning. That said, there is now an XSLTUpdateRequesthhandler in trunk and 3.4, which allows the specification of an XSLT transform to convert an arbitrary XML schema

Re: Do more fields cause more memory usage?

2011-09-30 Thread simon
More fields will cause some increase in memory use, but it's hard to quantify it, and it will also depend on how much usage the queries can make of Solr's caches, number of simultaneous queries. In general, the number of query fields has more impact on performance. A potentially big memory sink,

I think I've found a bug with filter queries and joins

2011-09-30 Thread Jason Toy
I'm testing out the join functionality on the svn revision 1175424. I've found when I add a single filter query to a join it works fine, but when I do more then 1 filter query, the query does not return results. This single function query with a join returns results:

Re: How to skip current document to index data from DIP

2011-09-30 Thread Ahmet Arslan
can anyone help me this problem? I'm using tika to index data from rich documents and index by http request. I queried from database to get fields and then combined with Tika. everything is ok, but i face to face with this error FileNotFoundException. I known this error, but I want skip

Re: basic solr cloud questions

2011-09-30 Thread Pulkit Singhal
SOLR-2355 is definitely a step in the right direction but something I would like to get clarified: a) There were some fixes to it that went on the 3.4 3.5 branch based on the comments section ... are they not available or not needed on 4.x trunk? b) Does this basic implementation distribute

Re: basic solr cloud questions

2011-09-30 Thread Pulkit Singhal
BTW I update the wiki with the following, hope it keeps it simpel for others starting out: Example B: Simple two shard cluster with shard replicas Note: This setup leverages copy/paste to setup 2 cores per shard and distributed searches validate a succesful completion of this example/exercise.

Re: MMapDirectory failed to map a 23G compound index segment

2011-09-30 Thread Rich Cariens
My colleague and I thought the same thing - that this is an O/S configuration issue. /proc/sys/vm/max_map_count = 65536 I honestly don't know how many segments were in the index. Our merge factor is 10 and there were around 4.4 million docs indexed. The OOME was raised when the MMapDirectory was

Re: basic solr cloud questions

2011-09-30 Thread Mark Miller
Thanks Pulkit! I'd actually been meaning to add the post.jar commands needed to index a doc to each shard to the wiki. Waiting till I streamline a few things though. - Mark On Sep 30, 2011, at 12:35 PM, Pulkit Singhal wrote: BTW I update the wiki with the following, hope it keeps it simpel

RE: Lucene 3.4.0 Merging

2011-09-30 Thread Steven A Rowe
Hi Ahson, The wiki page you got your cmdline invocation from http://wiki.apache.org/solr/MergingSolrIndexes was missing a space character between the classpath and org/apache/lucene/misc/IndexMergeTool. I've just updated that page. Steve -Original Message- From: Ahson Iqbal

Re: basic solr cloud questions

2011-09-30 Thread Yury Kats
On 9/30/2011 12:26 PM, Pulkit Singhal wrote: SOLR-2355 is definitely a step in the right direction but something I would like to get clarified: Questions about SOLR-2355 are best asked in SOLR-2355 :) b) Does this basic implementation distribute across shards or across cores? From a brief

Re: query cache result

2011-09-30 Thread Tomás Fernández Löbbe
I just read this response, sorry. I think this is not possible OOTB On Sat, Aug 20, 2011 at 4:30 PM, jame vaalet jamevaa...@gmail.com wrote: thanks tomas .. can we set querywindowsize of particular query through url ? say, i want only a particular set of query's result to be cached and not

Multithreaded JdbcDataStore and CachedSqlEntityProcessor

2011-09-30 Thread Maria Vazquez
Hi, I¹m using threads with JdbcDataStore and CachedSqlEntityProcessor. I noticed that if I make it single threaded CachedSqlEntityProcessor behaves as expected (it only queries the db once and caches all the rows). If I make it multi threaded it seems to make multiple db queries and when I debug

Re: Automate startup/shutdown of SolrCloud Shards

2011-09-30 Thread Mark Miller
On Sep 29, 2011, at 1:59 PM, Jamie Johnson wrote: I am trying to automate the startup/shutdown of SolrCloud shards and have noticed that there is a bit of a timing issue where if the server which is to bootstrap ZK with the configs does not complete it's process (i.e. there is no data at the

DataImportHandler frequency

2011-09-30 Thread tech20nn
I working with a transactional system with RDBMS. We have approximately 200 db transaction per minute at peak usage. I am planning to use Solr for indexing various part of data. I wanted to keep Solr indexes as up to date as possible, so the the search result returned to user are as fresh and

Re: DataImportHandler frequency

2011-09-30 Thread Lan
It's best run the data import once per minute. Solr updates works best when updates are batched and commits are infrequent. Doing a post per document as a transaction would require a solr commit, which could cause the server to hang under update load. Of course you could not do the commit but

RE: Getting facet counts for 10,000 most relevant hits

2011-09-30 Thread Burton-West, Tom
Hi Lan, I figured out how to do this in a kludgey way on the client side but it seems this could be implemented much more efficiently at the Solr/Lucene level. I described my kludge and posted a question about this to the dev list, but so far have not received any replies

Re: DataImportHandler using new connection on each query

2011-09-30 Thread Chris Hostetter
: Noble? Shalin? what's the point of throwing away a connection that's been : in use for more then 10 seconds? : Hoss, as others have noted, DIH throws away connections which have been idle : for more than the timeout value (10 seconds). The jdbc standard way of : checking for a valid

Re: Production Issue: SolrJ client throwing this error even though field type is not defined in schema

2011-09-30 Thread roz dev
This issue disappeared when we reduced the number of documents which were being returned from Solr. Looks to be some issue with Tomcat or Solr, returning truncated responses. -Saroj On Sun, Sep 25, 2011 at 9:21 AM, pulkitsing...@gmail.com wrote: If I had to give a gentle nudge, I would ask

RE: Getting facet counts for 10,000 most relevant hits

2011-09-30 Thread Chris Hostetter
: I figured out how to do this in a kludgey way on the client side but it : seems this could be implemented much more efficiently at the Solr/Lucene : level. I described my kludge and posted a question about this to the It can, and I have -- but only for the case of a single node... In

Re: Best Solr escaping?

2011-09-30 Thread Chris Hostetter
a) It depends entirely on what QueryParser you are using. If your input is from a human i would suggest using dismax or edismax and not escaping anything - unless you get some type of error, and then maybe give the user a there was a problem with your query, would you like to try where

Re: multiple dateranges/timeslots per doc: modeling openinghours.

2011-09-30 Thread Chris Hostetter
: Another, faulty, option would be to model opening/closing hours in 2 : multivalued date-fields, i.e: open, close. and insert open/close for each : day, e.g: : : open: 2011-11-08:1800 - close: 2011-11-09:0300 : open: 2011-11-09:1700 - close: 2011-11-10:0500 : open: 2011-11-10:1700 - close:

Re: Searching multiple fields

2011-09-30 Thread Chris Hostetter
: I have a use case where I would like to search across two fields but I do not : want to weight a document that has a match in both fields higher than a : document that has a match in only 1 field. use dismax, set the tie param to 0.0 (so it's a true max with no score boost for matching in

join sort query with key, value pair

2011-09-30 Thread abhayd
hi i have a document like this video_id,keyword:seq for example 1,service:2 support:1 2,support:2 3,service:2 4,service:1 What i want is a query will send video_id and in response i want to see video with that video_id and all other related videos with keyword in sorted order by seq say query is