Re: Quick Questions

2013-03-08 Thread Upayavira
In example/cloud-scripts/ you will find a Solr specific zkCli tool to upload/download configs. You will need to reload a core/collection for the changes to take effect. Upayavira On Fri, Mar 8, 2013, at 07:02 AM, Nathan Findley wrote: I am setting up solrcloud with zookeeper. - I am

SOLR - Recommendation on architecture

2013-03-08 Thread Kobe J
We are planning to use SOLR 4.1 for full text indexing. Following is the hardware configuration of the web server that we plan to install SOLR on:- *CPU*: 2 x Dual Core (4 cores) *R**AM:* 12GB *Storage*: 212GB *OS Version* – Windows 2008 R2 The dataset to be imported will have approx.. 800k

R: Query parsing issue

2013-03-08 Thread Francesco Valentini
Thank you very much, I've tried both the way that you have suggested to me. Then I've choosen to re-write the parse method by extending ExtendedDismaxQParser class. Francesco. -Messaggio originale- Da: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Inviato: mercoledì 6 marzo

Re: SOLR - Recommendation on architecture

2013-03-08 Thread Gora Mohanty
On 8 March 2013 14:19, Kobe J kobe.free.wo...@gmail.com wrote: We are planning to use SOLR 4.1 for full text indexing. Following is the hardware configuration of the web server that we plan to install SOLR on:- *CPU*: 2 x Dual Core (4 cores) *R**AM:* 12GB *Storage*: 212GB *OS Version* –

Re: JoinQuery and scores

2013-03-08 Thread Upayavira
I would recommend reading up on Lucene scoring, there's a lot to understand there. The join query parser (triggered by the use of {!join} syntax) searches for a list of documents matching the term specified, and provides a list of matching IDs. It then performs a second search based upon those

Re: Solr 4.x auto-increment/sequence/counter functionality.

2013-03-08 Thread mark12345
So I think I took the easiest option by creating an UpdateRequestProcessor implementation (I was unsure of the performance implications and object model of ScriptUpdateProcessor). The below DocumentCreationDetailsProcessorFactory class seems to achieve my aim of allowing me to sort my Solr

SOLR-3076 for beginners?

2013-03-08 Thread Uwe Reh
Hi, blockjoin seems to be a real cool feature. Unfortunately I'm to dumb, to get the patch running. I even don't know what to do :-( Is there anywhere an example, a howto or a cookbook, other than using elasticsearch or bare lucene? Uwe

Re: SOLR - Recommendation on architecture

2013-03-08 Thread Jilal Oussama
I would not recommend Windows too 2013/3/8 Kobe J kobe.free.wo...@gmail.com We are planning to use SOLR 4.1 for full text indexing. Following is the hardware configuration of the web server that we plan to install SOLR on:- *CPU*: 2 x Dual Core (4 cores) *R**AM:* 12GB *Storage*: 212GB

Re: SOLR - Recommendation on architecture

2013-03-08 Thread Upayavira
Because? Upayavira On Fri, Mar 8, 2013, at 09:27 AM, Jilal Oussama wrote: I would not recommend Windows too 2013/3/8 Kobe J kobe.free.wo...@gmail.com We are planning to use SOLR 4.1 for full text indexing. Following is the hardware configuration of the web server that we plan to

Re: Quick Questions

2013-03-08 Thread Nathan Findley
On 03/08/2013 05:06 PM, Upayavira wrote: In example/cloud-scripts/ you will find a Solr specific zkCli tool to upload/download configs. You will need to reload a core/collection for the changes to take effect. Upayavira On Fri, Mar 8, 2013, at 07:02 AM, Nathan Findley wrote: I am setting up

Re: SOLR - Recommendation on architecture

2013-03-08 Thread Upayavira
If you are attempting to assess performance, you should use as many records as you can muster. A Lucene index does start to struggle at a certain size, and you may be getting close to that, depending upon the size of your fields. Are you suggesting that you would host other services on the server

Mark document as hidden

2013-03-08 Thread lboutros
Dear all, I would like to mark documents as hidden. I could add a field hidden and pass the value to true, but the whole documents will be reindexed. And External file fields are not searchable. I could store the document keys in an external database and filter the result with these ids. But if

Re: Mark document as hidden

2013-03-08 Thread Upayavira
Without java coding, you cannot filter on things that aren't in your index. You would need to re-index the document, but maybe you could make use of atomic updates to just change the hidden field without needing to push the whole document again. Upayavira On Fri, Mar 8, 2013, at 11:40 AM,

RessourceLoader newInstance

2013-03-08 Thread Peter Kirk
Hi Can someone explain to me the point of the method public T T newInstance(String cname, ClassT expectedType) in interface org.apache.solr.common.ResourceLoader (or org.apache.lucene.analysis.util.ResourceLoader)? If I want to implement a ResourceLoader, what is the purpose of me

Re: Mark document as hidden

2013-03-08 Thread Erik Hatcher
External file fields, via function queries, are still usable for filtering. Consider using the frange function query to filter out hidden documents. Erik On Mar 8, 2013, at 6:40, lboutros boutr...@gmail.com wrote: Dear all, I would like to mark documents as hidden. I could add a

RE: inconsistent number of results returned in solr cloud

2013-03-08 Thread Hardik Upadhyay
HI I am using solr 4.0 (Not BETA), and have created 2 shard 2 replica configuration. But when I query solr with filter query it returns inconsistent result count. Without filter query it returns same consistent result count. I don't understand why? Can any one help in this? Best Regards

SolrCloud: port out of range:-1

2013-03-08 Thread roySolr
Hello, I have some problems with Solrcloud and Zookeeper. I have 2 servers and i want to have a solr instance on both servers. Both solr instances runs an embedded zookeeper. When i try to start the first one i get the error: port out of range:-1. The command i run to start solr with embedded

Re: inconsistent number of results returned in solr cloud

2013-03-08 Thread mike st. john
check for dup id's a quick way is to facet using the id as a field and set the mincount to 2. -Mike Hardik Upadhyay wrote: HI I am using solr 4.0 (Not BETA), and have created 2 shard 2 replica configuration. But when I query solr with filter query it returns inconsistent result count.

Re: Mark document as hidden

2013-03-08 Thread lboutros
Excellent Erik ! It works perfectly. Normal filter queries are cached. Is it the same for frange filter queries like this one ? : fq={!frange l=0 u=10}removed_revision Thanks to both for your answers. Ludovic. - Jouve France. -- View this message in context:

Re: Mark document as hidden

2013-03-08 Thread lboutros
One more question, is there already a way to update the external file (add values) in Solr ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045823.html Sent from the Solr - User mailing list archive at

Re: SOLR - Recommendation on architecture

2013-03-08 Thread Walter Underwood
Your servers seems to be about the right size, but as everyone else has said, it depends on the kinds of queries. Solr should be the only service on the system. Solr can make heavy use of the disk which will interfere with other processes. If you are lucky enough to get the system tuned to run

Re: Mark document as hidden

2013-03-08 Thread Erik Hatcher
Ludovic - Yes, this query would be cached (unless you say cache=false). Erik On Mar 8, 2013, at 10:26 , lboutros wrote: Excellent Erik ! It works perfectly. Normal filter queries are cached. Is it the same for frange filter queries like this one ? : fq={!frange l=0

Re: Mark document as hidden

2013-03-08 Thread Erik Hatcher
The external file is maintained externally. Solr only reads it, and does not have a facility to write to it, if that is what you're asking. Erik On Mar 8, 2013, at 10:43 , lboutros wrote: One more question, is there already a way to update the external file (add values) in Solr ?

RE: Migrate Solr 3.4 w/ solr-1255 GeoHash to Solr 4

2013-03-08 Thread David Smiley (@MITRE.org)
The underling index format is unchanged between SOLR-2155 and Solr 4 provided that this is only about indexing points, and SOLR-2155 could only index points any way. To really ensure it's drop-in compatible, specify maxLevels=12 *instead of* setting maxDistErr (which indirectly derives a

RE: Migrate Solr 3.4 w/ solr-1255 GeoHash to Solr 4

2013-03-08 Thread David Smiley (@MITRE.org)
You're supposed to add geo point data in latitude, longitude format, although some other variations work. Is your updating process supplying a geohash instead? If so you could write a simple Solr UpdateRequestProcessor to convert it to the expected format. But that doesn't help the fact that

Re: Mark document as hidden

2013-03-08 Thread lboutros
Ok, thanks Erik. Do you see any problem in modifying the Update handler in order to append some values to this file ? Ludovic - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045839.html Sent from the Solr - User

How can I limit my Solr search to an arbitrary set of 100,000 documents?

2013-03-08 Thread Andy Lester
We've got an 11,000,000-document index. Most documents have a unique ID called flrid, plus a different ID called solrid that is Solr's PK. For some searches, we need to be able to limit the searches to a subset of documents defined by a list of FLRID values. The list of FLRID values can

Re: Mark document as hidden

2013-03-08 Thread lboutros
I could create an UpdateRequestProcessorFactory that could update this file, it seems to be better ? - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045842.html Sent from the Solr - User mailing list archive at

Re: How can I limit my Solr search to an arbitrary set of 100,000 documents?

2013-03-08 Thread Roman Chyla
hi Andy, It seems like a common type of operation and I would be also curious what others think. My take on this is to create a compressed intbitset and send it as a query filter, then have the handler decompress/deserialize it, and use it as a filter query. We have already done experiments with

Re: How can I limit my Solr search to an arbitrary set of 100,000 documents?

2013-03-08 Thread Walter Underwood
First, terms used to subset the index should be a filter query, not part of the main query. That may help, because the filter query terms are not used for relevance scoring. Have you done any system profiling? Where is the bottleneck: CPU or disk? There is no point in optimising things before

Re: InvalidShapeException when using SpatialRecursivePrefixTreeFieldType with custom worldBounds

2013-03-08 Thread David Smiley (@MITRE.org)
Hi Jon. If you're able to trigger an IndexOutOfBoundsException out of the prefix tree then please file a bug (to the Lucene project, not Solr). I'll look into it when I have time. I need to add a Wiki page on the use of spatial for time ranges; there are some tricks to it. Nevertheless you've

Re: Solr 4.1 UI fail to display result

2013-03-08 Thread Stefan Matheis
I know, it's a bit late on this thread, but for the record - filed and already fixed: https://issues.apache.org/jira/browse/SOLR-4349 On Saturday, February 2, 2013 at 6:35 PM, J Mohamed Zahoor wrote: It works In chrome though... ./Zahoor@iPhone On 02-Feb-2013, at 4:34 PM, J Mohamed

Re: How can I limit my Solr search to an arbitrary set of 100,000 documents?

2013-03-08 Thread Roman Chyla
I think we speak of one use case where user wants to limit the search into a collection of documents but there is no unifying (easy) way to select those papers - besides a loong query: id:1 OR id:5 OR id:90... And no, the latency of several hundred milliseconds is perfectly achievable with

Re: How to add shard in 4.2-snapshot

2013-03-08 Thread Mark Miller
On Mar 8, 2013, at 12:23 AM, Jam Luo cooljam2...@gmail.com wrote: Hi I use the 4.2-snapshot version, git sha id is f4502778b263849a827e89e45d37b33861f225f9 . I deploy a cluster by SolrCloud, there is 3 node,one core per node,they are in defferent shard. the JVM argument is -DnumShards=3.

Re: Dynamic schema design: feedback requested

2013-03-08 Thread Steve Rowe
Hi Jan, On Mar 6, 2013, at 4:50 PM, Jan Høydahl jan@cominvent.com wrote: Will ZK get pushed the serialized monolithic schema.xml / schema.json from the node which changed it, and then trigger an update to the rest of the cluster? Yes. I was kind of hoping that once we have introduced

Re: SolrCloud: port out of range:-1

2013-03-08 Thread Shawn Heisey
On 3/8/2013 7:37 AM, roySolr wrote: java -Djetty.port=4110 -DzkRun=10.100.10.101:5110 -DzkHost=10.100.10.101:5110,10.100.10.102:5120 -Dbootstrap_conf=true -DnumShards=1 -Xmx1024M -Xms512M -jar start.jar It runs Solr on port 4110, the embedded zk on 5110. The -DzkHost gives the urls of the

Re: Dynamic schema design: feedback requested

2013-03-08 Thread Steve Rowe
On Mar 6, 2013, at 7:50 PM, Chris Hostetter hossman_luc...@fucit.org wrote: I think it would make a lot of sense -- not just in terms of implementation but also for end user clarity -- to have some simple, straightforward to understand caveats about maintaining schema information... 1)

Re: SolrCloud: port out of range:-1

2013-03-08 Thread Tomás Fernández Löbbe
A couple of comments about your deployment architecture too. You'll need to change the zoo.cfg to make the Zookeeper ensemble work with two instances as you are trying to do, have you? The example configuration with the zoo.cfg is intended for a single ZK instance as described in the SolrCloud

Re: SolrCloud: port out of range:-1

2013-03-08 Thread Walter Underwood
A two server Zookeeper ensemble is actually less reliable than a one server ensemble. With two servers, Zookeeper stops working if either of them fail, so there is a higher probability that it will go down. The minimum number for increased reliability is three servers. wunder On Mar 8, 2013,

Re: Dynamic schema design: feedback requested

2013-03-08 Thread Steve Rowe
On Mar 8, 2013, at 2:57 PM, Steve Rowe sar...@gmail.com wrote: multiple collections may share the same config set and thus schema, so what happens if someone does not know this and hits PUT localhost:8983/solr/collection1/schema and it affects also the schema for collection2? Hmm, that's

update some fields vs replace the whole document

2013-03-08 Thread Mingfeng Yang
Generally speaking, which has better performance for Solr? 1. updating some fields or adding new fields into a document. or 2. replacing the whole document. As I understand, update fields need to search for the corresponding doc first, and then replace field values. While replacing the whole

Re: update some fields vs replace the whole document

2013-03-08 Thread Upayavira
With an atomic update, you need to retrieve the stored fields in order to build up the full document to insert back. In either case, you'll have to locate the previous version and mark it deleted before you can insert the new version. I bet that the amount of time spent retrieving stored fields

Re: update some fields vs replace the whole document

2013-03-08 Thread Mingfeng Yang
Then what's the difference between adding a new document vs. replacing/overwriting a document? Ming- On Fri, Mar 8, 2013 at 2:07 PM, Upayavira u...@odoko.co.uk wrote: With an atomic update, you need to retrieve the stored fields in order to build up the full document to insert back. In

Re: High QTime when wildcards in hl.fl are used

2013-03-08 Thread Karol Sikora
I've found more interesting informations about using fastVectorHighlighting combined with highlighted fields with wildcards after testing on isolated group of documents with text content. fvh + fulltext_*: QTime ~4s (!) fvh + fulltext_1234: QTime ~50ms no fvh + fulltext_*: QTime ~600ms no fvh +

Multiple Collections in one Zookeeper

2013-03-08 Thread jimtronic
Hi, I have a solrcloud cluster running several cores and pointing at one zookeeper. For performance reasons, I'd like to move one of the cores on to it's own dedicated cluster of servers. Can I use the same zookeeper to keep track of both clusters. Thanks! Jim -- View this message in

Re: Multiple Collections in one Zookeeper

2013-03-08 Thread Michael Della Bitta
Yes, but you'll need to append a sub path on to the zookeeper path for your second cluster. For ex: zookeeper1.example.com,zookeeper2.example.com,zookeeper3.example.com/subpath On Mar 8, 2013 6:46 PM, jimtronic jimtro...@gmail.com wrote: Hi, I have a solrcloud cluster running several cores

RE: Migrate Solr 3.4 w/ solr-1255 GeoHash to Solr 4

2013-03-08 Thread Parks, Harley
Yes. Success. I was able to successfully migrate solr 3.4 w/ solr-2155 solrconfig.xml and schema.xml; but I had to rebuild the database (solr index data folder). fieldType name=geohash_rpt class=solr.SpatialRecursivePrefixTreeFieldType geo=true distErrPct=0 maxLevels=12

Re: update some fields vs replace the whole document

2013-03-08 Thread Jack Krupansky
Generally it will be more a matter of application semantics. Solr makes it reasonably efficient to completely overwrite the existing document and fields, if that is what you want. But, in some applications, it may be desirable to preserve some or most of the existing fields; whether that is

Re: Search a folder with File name and retrieve all the files matched

2013-03-08 Thread Jan Høydahl
Since this is a POC you could simply run this command with the default example schema: cd solr/example/exampledocs java -Dauto -Drecursive=0 -jar post.jar path/to/folder You will get the full file name with path in field resourcename If you need to search just the filename, you can achieve that

Re: Search a folder with File name and retrieve all the files matched

2013-03-08 Thread Erik Hatcher
Thanks, Jan, for making the post tool do this type of thing. Great stuff. The filename would be a good one add for out of the box goodness. We can easily add just the filename to the index with something like the patch below. And on that note, what else would folks want in an easy to use