Re: DIH deleting documents

2013-02-25 Thread cveres
Thanks Arcadius, Excellent suggestion about the view.I'll try to simplify things and see how I go. thanks, Csaba -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4042663.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Grouping and empty fields

2013-02-25 Thread Oussama Jilal
Ok, Thank you all for precious help :) On 02/24/2013 04:37 PM, Teun Duynstee wrote: That would depend on your indexing setup. We have a custom application for indexing, so we just make a value up. In our case a GUID (UUID). But I imagine that you could also just copy your id field with a

Re: Slaves always replicate entire index Index versions

2013-02-25 Thread raulgrande83
Hello everybody. I have downloaded the 4.2-SNAPSHOT version that Mark linked at the JIRA and our first tests have been OK. Slaves now doesn't need to replicate the entire index and index versions between nodes are the same when replication process is completed. This 4.2 version is here:

170G index, 1.5 billion documents, out of memory on query

2013-02-25 Thread zqzuk
Hi I am really frustrated by this problem. I have built an index of 1.5 billion data records, with a size of about 170GB. It's been optimised and has 12 separate files in the index directory, looking like below: _2.fdt --- 58G _2.fdx --- 80M _2.fnm--- 900bytes _2.si --- 380bytes

Many(one)-to-many relationship problems

2013-02-25 Thread ipuskaric
Let's say I have model in my db like this: product:n - n:package Product properties are: name, package ids. Package properties are: price, region, subscription. If the user requirement is to show all product data and product price (and to sort by price) for products that matched some user

id field doesn't match

2013-02-25 Thread b.riez...@pixel-ink.de
Hi all, i have an id field wich always contains a string with that schema vw-200130315- Wich field type and settings should i use to get exactly this id as a result. Actually i always get more then one result. Kind regards Benjamin

Re: id field doesn't match

2013-02-25 Thread Rafał Kuć
Hello! If you what you need is an exact match, try using the simple string type. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Hi all, i have an id field wich always contains a string with that schema vw-200130315- Wich field type

Re: zk Config URL?

2013-02-25 Thread Darren Govoni
Hi Mark, I download latest zk, and run it. In my glassfish server, I set these system wide properties: numShards = 1 zkHost = 10.x.x.x:2181 jetty.port = 8080 (port of my domain) bootstrap_config = true I copy all the solr 4.1 dist/*.jar into my glassfish domain lib/ext directory.

Re: 170G index, 1.5 billion documents, out of memory on query

2013-02-25 Thread Artem OXSEED
Hello, adding my 5 cents here as well: it seems that we experienced similar problem that was supposed to be fixed or not appear at all for 64-bit systems. Our current solution is custom build of Solr with DEFAULT_READ_CHUNK_SIZE set t0 10MB in FSDirectory class. This fix was done however not

Re: Many(one)-to-many relationship problems

2013-02-25 Thread Michael Della Bitta
Hello Puska, I might not have understood your requirements, but if for a given user, there's only one package per product that should ever be retrieved, I'd make the document represent one package/price combination, and then use a filter query to ensure the user's searches only retrieve

Re: [ANN] vifun: tool to help visually tweak Solr boosting

2013-02-25 Thread Jan Høydahl
Cool. I tried running from source (using the bundled griffonw), but I think the instructions may be wrong, had to download binary dist. The file permissions for bin/vifun in binary dist should have +w so you can execute it with ./vifun What about the ability to override the wt param, so that

Re: solr search integration

2013-02-25 Thread Jan Høydahl
Have you tried one of the extensions out there, such as https://code.google.com/p/magento-community-edition-solr/ ? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 25. feb. 2013 kl. 14:12 skrev Rohan Thakur rohan.i...@gmail.com:

Max Score Query parser?

2013-02-25 Thread Jan Høydahl
Hi, A customer sends large, deeply nested boolean queries to Solr using the default (lucene) parser. The default scoring is summing up all the scores. For parts of this query they would like to use the Max score instead of the sum, e.g. for q=+A +B +(C D E) we want the max of C,D,E. I was

Re: Slaves always replicate entire index Index versions

2013-02-25 Thread Mark Miller
On Feb 25, 2013, at 5:54 AM, raulgrande83 raulgrand...@hotmail.com wrote: Mark, is going to be an official 4.2 release soon? I've suggested on the dev mailing list that I will create a Lucene/Solr 4.2 release within the next few weeks unless someone beats me to it. I can't do it this week, I

How to set the shardid?

2013-02-25 Thread Markus.Mirsberger
Hi, I have two servers, each server one shard in a collection. Id like to have one server have the same shardId for every collection I create (eg shard1 on server1 and shard2 on server2) I thought this would work by setting -DshardId=shard1 when starting the server. But the shardId's shard1

Re: Introducing Solrstrap: A blazing fast tool for querying Solr in a Googleish fashion

2013-02-25 Thread Jan Høydahl
Great Fergus, You have really been working on this since the MeetUp in Oslo! Impressive how much you can do with little code. Have you started thinking about UI widget support for query box, breadcrumb path, facets, paging controls etc? Are you going to budle in a particular UI widget

Re: Many(one)-to-many relationship problems

2013-02-25 Thread ipuskaric
Hi Michael, As I can see there are two directions how to do that: 1) store all package data in product documents 2) have separate documents for packages In reality there are lots of packages for every product. So if I make document for every combination product+package I'll get lots of

Re: Many(one)-to-many relationship problems

2013-02-25 Thread Michael Della Bitta
Hi Ivan, Generally the denormalization strategies you might use to optimize a relational database are antipatterns when dealing with Solr, so I wouldn't hesitate to give this option a try. Solr's very good at reducing the footprint of a field value duplicated across many documents down to a

Re: How to set the shardid?

2013-02-25 Thread Mark Miller
On Feb 25, 2013, at 10:00 AM, Markus.Mirsberger markus.mirsber...@gmx.de wrote: How can I fix the shardId used at one server when I create a collection? (Im using the solrj collections api to create collections) You can't do it with the collections API currently. If you want to control the

Re: numFound is not correct while using Result Grouping

2013-02-25 Thread Teun Duynstee
You have to set group.ngroups=true (see http://wiki.apache.org/solr/FieldCollapsing). Be aware that including the number of groups is a surprisingly heavy operation, though. Teun 2013/2/25 Nicholas Ding nicholas...@gmail.com Hello, I grouped the result, and set group.main=true. I was

CurrencyField querying in Solr 4

2013-02-25 Thread Gerald Blanck
We are attempting to leverage the CurrecyField type. We have defined the currency field type as: fieldType name=currency class=solr.CurrencyField precisionStep=8 defaultCurrency=USD currencyConfig=currency.xml / And defined a field as: dynamicField name=*_money type=currency indexed=true

Re: 170G index, 1.5 billion documents, out of memory on query

2013-02-25 Thread Shawn Heisey
On 2/25/2013 4:06 AM, zqzuk wrote: Hi I am really frustrated by this problem. I have built an index of 1.5 billion data records, with a size of about 170GB. It's been optimised and has 12 separate files in the index directory, looking like below: _2.fdt --- 58G _2.fdx --- 80M _2.fnm---

Re: numFound is not correct while using Result Grouping

2013-02-25 Thread Carlos Maroto
Use group.ngroups, check it in the Solr wiki for FieldCollapsing Carlos Maroto Search Architect at Search Technologies (www.searchtechnologies.com) Nicholas Ding nicholas...@gmail.com wrote: Hello, I grouped the result, and set group.main=true. I was expecting the numFound equals to the

User Query Processing Sanity Check

2013-02-25 Thread z...@navigo.com
Have been working with Solr for about 6 months, straightforward stuff, basic keyword searches. We want to move to more advanced stuff, to support 'must include', 'must not include', set union, etc. I.e., more advanced query strings. We seem to have hit a block, and are considering two paths and

Re: numFound is not correct while using Result Grouping

2013-02-25 Thread Nicholas Ding
Thanks Teun and Carlos, I set group.ngroups=true, but I don't have this ngroup number when I was using group.main = true. On Mon, Feb 25, 2013 at 12:02 PM, Carlos Maroto cmar...@searchtechnologies.com wrote: Use group.ngroups, check it in the Solr wiki for FieldCollapsing Carlos Maroto

Re: 170G index, 1.5 billion documents, out of memory on query

2013-02-25 Thread zqzuk
Hi, thanks for your advice! I have deliberately allocated 32G to JVM, with the command java -Xmx32000m -jar start.jar etc. I am using our server which I think has a total of 48G. However it still crashes because of that error when I specify any keywords in my query. The only query that worked, as

Re: 170G index, 1.5 billion documents, out of memory on query

2013-02-25 Thread Timothy Potter
The other issue you need to be worried about is long full GC pauses with -Xmx32000m. Maybe try reducing your JVM Heap considerably (e.g. -Xmx8g) and switching to the MMapDirectory - see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html In solrconfig.xml, this would be:

Re: 170G index, 1.5 billion documents, out of memory on query

2013-02-25 Thread Shawn Heisey
On 2/25/2013 11:05 AM, zqzuk wrote: I have deliberately allocated 32G to JVM, with the command java -Xmx32000m -jar start.jar etc. I am using our server which I think has a total of 48G. However it still crashes because of that error when I specify any keywords in my query. The only query that

Re: [ANN] vifun: tool to help visually tweak Solr boosting

2013-02-25 Thread jmlucjav
Jan, thanks for looking at this! - Running from source: would you care to send me the error you get (if any) when running from source? I assume you have griffon1.1.0 installed right? - Binary dist: the distrib is created by griffon, so I'll check if the permission issue (I develop on windows,

RE: User Query Processing Sanity Check

2013-02-25 Thread Swati Swoboda
Maybe I am not understanding correctly, but have you overlooked the qf parameter for Edismax? http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields.29 Suppose you want to search for the phrase apples and bananas in title, summary, and body. You also want it to have greater emphasis

Re: [ANN] vifun: tool to help visually tweak Solr boosting

2013-02-25 Thread Roman Chyla
Oh, wonderful! Thank you :) I was hacking some simple python/R scripts that can do a similar job for qf... the idea was to let the algorithm create possible combinations of params and compare that against the baseline. Would it be possible/easy to instruct the tool to harvest results for

Re: 170G index, 1.5 billion documents, out of memory on query

2013-02-25 Thread zqzuk
Thanks again for your kind input! I followed Tim's advice and tried to use MMapDirectory. Then I get outofmemory on solr startup (tried giving only 8G, 4G to JVM) I guess this truely indicates that there arent sufficient memory for such a huge index. On another thread I posted days before,

Re: 170G index, 1.5 billion documents, out of memory on query

2013-02-25 Thread Michael Della Bitta
Hello Zqzuk, It's true that this index is probably too big for a single shard, but make sure you heed Shawn's advice and use a 64-bit JVM in any case! Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271

Re: Max Score Query parser?

2013-02-25 Thread Mikhail Khludnev
Jan, I think it's worth to start from extending LuceneQParser. Then after parent's parse() returns a query instance. It can be cast to BooleanQuery, after that it's possible to check that all clauses have SHOULD occur, and to create an instance of DisjunctionMaxQuery() from the given clauses. Am

Re: [ANN] vifun: tool to help visually tweak Solr boosting

2013-02-25 Thread jmlucjav
Hi Roman, I read with interest your thread about relevance testing a couple of weeks ago and yes, I noticed it was related somehow. But what you were proposing there is a different approach I think. In my tool, you have some baseline setting (it might be good or bad), and using a single query,

Re: 170G index, 1.5 billion documents, out of memory on query

2013-02-25 Thread Timothy Potter
Do you have the stack trace for the OOM during startup when using MMapDirectory? That would be interesting to know. Cheers, Tim On Mon, Feb 25, 2013 at 1:15 PM, zqzuk ziqizh...@hotmail.co.uk wrote: Hi Michael Yes I have double checked and pretty sure its 64bit java. Thanks -- View this

Re: numFound is not correct while using Result Grouping

2013-02-25 Thread Teun Duynstee
Ah, I see. The docs say Although this result format does not have as much information, it may be easier for existing solr clients to parse. I guess the ngroups value could be added to this format, but apparently it isn't. I do agree with you that to be usefull (as in possible to read for a client

Toulouse JUG looks for speakers

2013-02-25 Thread Alexis Krier
Hello all, I am from the Toulouse JUG in France, I'm looking for speakers to talk about solr in our JUG, any body? French or English are welcome. thx Alexis

Re: Max Score Query parser?

2013-02-25 Thread Jack Krupansky
Bite the bullet and use a function query for the boost: bf=max(query({!v='field:C'}),query({!v='field:D'}),query({!v='field:E'})) -- Jack Krupansky -Original Message- From: Jan Høydahl Sent: Monday, February 25, 2013 6:32 AM To: solr-user@lucene.apache.org Subject: Max Score Query

Re: numFound is not correct while using Result Grouping

2013-02-25 Thread Amit Nithian
Yeah I had a similar problem. I filed and submitted this patch: https://issues.apache.org/jira/browse/SOLR-4310 Let me know if this is what you are looking for! Amit On Mon, Feb 25, 2013 at 1:50 PM, Teun Duynstee t...@duynstee.com wrote: Ah, I see. The docs say Although this result format

Re: [ANN] vifun: tool to help visually tweak Solr boosting

2013-02-25 Thread Amit Nithian
This is cool! I had done something similar except changing via JConsole/JMX: https://issues.apache.org/jira/browse/SOLR-2306 We had something not as nice at Zvents but I wanted to expose these as MBean properties so you could change them via any JMX UI like JVisualVM Cheers! Amit On Mon, Feb

Re: CurrencyField querying in Solr 4

2013-02-25 Thread Chris Hostetter
: my_money:[* TO *] : : The result is ALL documents (even though only 1 document actually has this : field populated. ... : +my_money:[* TO *] -my_money:0 : : We get the single document back. Hmmm, i can reproduce, and that definitely doesn't make any sense to me. There are some open

Re: 答复: solr shards

2013-02-25 Thread rulinma
I give some comments to this tompic: 1 compoisteId with setting numshards 1.1 uinique id (hash alogrith to set shard) 1.2 espically, prefix with ! will be route to same shard if you set ! in id 2 not set numshars 2.1 user using _field_(schema.xml) to set where to sink data

Distributed Search and the Stale Check

2013-02-25 Thread Ryan Zezeski
Hello Solr Users, I just wrote up a piece about some work I did recently to improve the throughput of distributed search. http://www.zinascii.com/2013/solr-distributed-search-and-the-stale-check.html The short of it is that the stale check in Apache's HTTP Client used by SolrJ can add a lot of

RE: Distributed Search and the Stale Check

2013-02-25 Thread Michael Ryan
I don't have anything to add besides saying this is awesome. Great analysis. -Michael

Re: Distributed Search and the Stale Check

2013-02-25 Thread Mark Miller
On Feb 25, 2013, at 8:14 PM, Ryan Zezeski rzeze...@gmail.com wrote: I would like to see a similar fix made upstream and that is why I am posting here. Please file a JIRA issue and attach your patch. Great write up! (Saw it pop up on twitter, so I read it a little earlier). - Mark

Re: Distributed Search and the Stale Check

2013-02-25 Thread Yonik Seeley
On my particular benchmark rig, each stale check call accounted for an additional ~10ms. That's insane! It's still not even clear to me how the stale check works (reliably). Couldn't the server still close the connection between the stale check and the send of data by the client? -Yonik

RE: Solr Suggester component doesn't return hits for non-English words

2013-02-25 Thread Carlos Maroto
Hi Dejan, I wouldn't say your problem is because the words are non-English words as there is nothing in Solr to indicate that the terms are or not in English. I think it is a configuration thing in your implementation for the current data set or test, I would start by trying the following: -

Re: zk Config URL?

2013-02-25 Thread Anirudha Jadhav
Solr cloud reads solr cfg files from zookeeper. You need to push the cfg to zookeeper link collection to cfg. This is exactly what mark suggested earlier in the thread. This is also explained in solr cloud wiki. On Monday, February 25, 2013, Darren Govoni wrote: Hi Mark, I download

Re: Solr Suggester component doesn't return hits for non-English words

2013-02-25 Thread Jack Krupansky
Try changing splitOnCaseChange=1 to splitOnCaseChange=0, and fully reindex your data. One possibility is that you may have indexed Marcos and Dejan before adding the lower case filter, which would cause the query to be lower case even though the indexed data might not be lower case. -- Jack

Re: zk Config URL?

2013-02-25 Thread darren
Ok. But its way too complicated than it should be. It should work smarter. Sent from my Verizon Wireless 4G LTE Smartphone Original message From: Anirudha Jadhav aniru...@nyu.edu Date: To: solr-user@lucene.apache.org Subject: Re: zk Config URL? Solr cloud reads solr cfg

Re: Distributed Search and the Stale Check

2013-02-25 Thread Ryan Zezeski
On Mon, Feb 25, 2013 at 8:42 PM, Yonik Seeley yo...@lucidworks.com wrote: That's insane! It is insane. Keep in mind this was a 5-node cluster on the same physical machine sharing the same resources. It consist of 5 smartos zones on the same global zone. On my MacBook Pro I saw ~1.5ms

Re: splitting big, existing index into shards

2013-02-25 Thread Mark Miller
On Thu, Feb 21, 2013 at 1:19 PM, Upayavira u...@odoko.co.uk wrote: A splitter that uses the same split technique but uses the shard assignment algorithm from SolrCloud could be a useful thing. There is some on going work on shard splitting, and I assume a splitter like this is part of that.

Re: Poll: SolrCloud vs. Master-Slave usage

2013-02-25 Thread Lance Norskog
Do you use replication instead, or do you just have one instance? On 02/25/2013 07:55 PM, Otis Gospodnetic wrote: Hi, Quick poll to see what % of Solr users use SolrCloud vs. Master-slave setup: http://blog.sematext.com/2013/02/25/poll-solr-cloud-or-not/ I have to say I'm surprised with the

Re: Multi-threaded post.jar?

2013-02-25 Thread Otis Gospodnetic
Upayavira, ever did this? Ha, look at my email from 20 days ago and this: https://github.com/javanna/elasticshell Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Feb 6, 2013 at 2:38 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Btw wouldn't this be a chance to

DataDirectory: relative path doesn't work

2013-02-25 Thread Patrick Mi
I am running Solr4.0/Tomcat 7 on Centos6 According to this page http://wiki.apache.org/solr/SolrConfigXml if dataDir is not absolute, then it is relative to the instanceDir of the SolrCore. However the index directory is always created under the directory where I start the Tomcat (startup.sh)

Re: Slaves always replicate entire index Index versions

2013-02-25 Thread Artyom
Interesting, that there is no such a bug if I disable index compression, discussed here: https://issues.apache.org/jira/browse/SOLR-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566364#comment-13566364 -- View this message in context:

Re: Poll: SolrCloud vs. Master-Slave usage

2013-02-25 Thread Walter Underwood
I cannot answer yes to any of those options. Master/slave and cloud have different strengths and weaknesses. We will use each one where it is appropriate. The loose coupling in master/slave is a very good thing and increases robustness for a corpus that does not have tight freshness