Running Solr 4 on Sun vs OpenJDK JVM?

2013-07-23 Thread Cosimo Streppone
Hi, do you have any advice on operating a Solr 4.0 read-only instance with regards to the underlying JVM? In particular I'm wondering about stability and memory usage, but anything else you might add is welcome, when it comes to OpenJDK vs Sun/Oracle Hotspot, v6 vs v7. What are you running,

Re: adding date column to the index

2013-07-23 Thread Gora Mohanty
On 23 July 2013 11:13, Mysurf Mail stammail...@gmail.com wrote: clarify: I did deleted the data in the index and reloaded it (+ commit). (As i said, I have seen it loaded in the sb profiler) [...] Please share your DIH configuration file, and Solr's schema.xml. It must be that somehow the

Refering SOLRcore properties in zookeeper

2013-07-23 Thread sathish_ix
Hi, I have uploaded solrconfig.xml, db-data-config.xml , solrcore.properties (ABC.properties ) files into zookeeper. below is my solr.xml file, ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores defaultCoreName=ABC adminPath=/admin/cores zkClientTimeout=${zkClientTimeout:15000}

Re: adding date column to the index

2013-07-23 Thread Mysurf Mail
Ahaa I deleted the data folder and now I get Invalid Date String:'2010-01-01 00:00:00 +02:00' I need to cast it to solr. as I read it in the schema using field name=LastModificationTime type=date indexed=false stored=true required=true/ On Tue, Jul 23, 2013 at 10:50 AM, Gora Mohanty

Re: adding date column to the index

2013-07-23 Thread Mysurf Mail
How do I cast datetimeoffset(7)) to solr date On Tue, Jul 23, 2013 at 11:11 AM, Mysurf Mail stammail...@gmail.com wrote: Ahaa I deleted the data folder and now I get Invalid Date String:'2010-01-01 00:00:00 +02:00' I need to cast it to solr. as I read it in the schema using field

Re:

2013-07-23 Thread wiredkel
Hi! http://mackieprice.org/cbs.com.network.html

Indexing Oracle Database in Solr using Data Import Handler

2013-07-23 Thread archit2112
Im trying to Index oracle database 10g XE using Solr's Data Import Handler. My data-config.xml looks like this dataConfig dataSource type=JdbcDataSource driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@XXX.XXX.XXX.XXX::xe user=XX password=XX / document name=product_info

Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Furkan KAMACI
Hi; Sometimes a huge part of a document may exist in another document. As like in student plagiarism or quotation of a blog post at another blog post. Does Solr/Lucene or its libraries (UIMA, OpenNLP, etc.) has any class to detect it?

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Tommaso Teofili
Hi, I you may leverage and / or improve MLT component [1]. HTH, Tommaso [1] : http://wiki.apache.org/solr/MoreLikeThis 2013/7/23 Furkan KAMACI furkankam...@gmail.com Hi; Sometimes a huge part of a document may exist in another document. As like in student plagiarism or quotation of a

RE: Solr 4.1.0 not using solrcore.properties ?

2013-07-23 Thread sathish_ix
Hi , Can any one help on how to refer the solrcore.properties uploaded into Zookeeper ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-1-0-not-using-solrcore-properties-tp4040228p4079654.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Furkan KAMACI
Actually I need a specialized algorithm. I want to use that algorithm to detect duplicate blog posts. 2013/7/23 Tommaso Teofili tommaso.teof...@gmail.com Hi, I you may leverage and / or improve MLT component [1]. HTH, Tommaso [1] : http://wiki.apache.org/solr/MoreLikeThis 2013/7/23

problems about solr replication in 4.3

2013-07-23 Thread xiaoqi
hi,all i have two solr ,one is master , one is replication , before i use them under 3.5 version . it works fine . when i upgrade to 4.3version , i found when replication solr copying index from master , it will clean current index and copy new version to self folder . slave can't search

facet.maxcount ?

2013-07-23 Thread Jérôme Étévé
Hi all happy Solr users! I was wondering if it's possible to have some sort of facet.maxcount equivalent? In short, that would exclude from the facet any term (or query) that matches at least facet.maxcount times. That facet.maxcount would probably significantly improve the performance of

RE: facet.maxcount ?

2013-07-23 Thread Markus Jelsma
Hi - No but there are two unresolved issues about this topic: https://issues.apache.org/jira/browse/SOLR-4411 https://issues.apache.org/jira/browse/SOLR-4411 Cheers -Original message- From:Jérôme Étévé jerome.et...@gmail.com Sent: Tuesday 23rd July 2013 12:58 To:

RE: facet.maxcount ?

2013-07-23 Thread Markus Jelsma
Eeh, here's the other one: https://issues.apache.org/jira/browse/SOLR-1712 -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Tuesday 23rd July 2013 13:18 To: solr-user@lucene.apache.org Subject: RE: facet.maxcount ? Hi - No but there are two unresolved

Appending *-wildcard suffix on all terms for querying: move logic from client to server side

2013-07-23 Thread Paul Blanchaert
My client has an installation with 3 different clients using the same Solr index. These clients all append a * wildcard suffix in the query: user enters abc def while search is performed against (abc* def*). In order to move away from this way of searching, we'd like to move the clients away from

Re: highlighting required in document

2013-07-23 Thread Dmitry Kan
You just need to specify the emphasizing tag in hl params by adding something like this to your query: hl.fl=contenthl.simple.pre=bhl.simple.post=%2Fb Check the solr admin page, the querying item, it shows the constructed query, so you don't need to guess! Regards, Dmitry On Mon, Jul 22,

Re: highlighting required in document

2013-07-23 Thread Dmitry Kan
Ah, I think I misread your question. So your question is actually, how make solr embed higlighting into the doc response itself. I'm not aware of such a functionality. This why you have the highlighting section in your response. On Tue, Jul 23, 2013 at 2:30 PM, Dmitry Kan solrexp...@gmail.com

Re: facet.maxcount ?

2013-07-23 Thread Jérôme Étévé
Thanks! On 23 July 2013 12:19, Markus Jelsma markus.jel...@openindex.io wrote: Eeh, here's the other one: https://issues.apache.org/jira/browse/SOLR-1712 -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Tuesday 23rd July 2013 13:18 To:

Re: how to improve (keyword) relevance?

2013-07-23 Thread Erick Erickson
Another thing I've seen people do is something like text:(test AND pdf)^10 text:(test pdf). so docs with both terms in the text field get boosted a lot, but docs with either one will still get found. But as Jack says, you have to demonstrate a problem before you propose a solution. You say a

Re: IllegalStateException

2013-07-23 Thread Erick Erickson
There has been a _ton_ of work since 4.0, and 4.4 will be out in a day or two. I suspect the best advice is to try 4.4... Best Erick On Mon, Jul 22, 2013 at 2:54 PM, Michael Long ml...@mlong.us wrote: I'm seeing random crashes in solr 4.0 but I don't have anything to go on other than

Re: how to improve (keyword) relevance?

2013-07-23 Thread Otis Gospodnetic
To add to what Erick said, that *quantifying* is hugely important! How do you measure your search relevance improvements? How are you currently measuring it? How will you see, after you apply any changes, whether relevance was improved and how much? How will you know whether, even test queries you

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-23 Thread Erick Erickson
Neil: Here's a must-read blog about why allocating more memory to the JVM than Solr requires is a Bad Thing: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html It turns out that you actually do yourself harm by allocating more memory to the JVM than it really needs. Of

Re: Appending *-wildcard suffix on all terms for querying: move logic from client to server side

2013-07-23 Thread Mikhail Khludnev
It can be done by extending LuceneQParser/SolrQueryParser see http://wiki.apache.org/solr/SolrPlugins#QParserPlugin there is newTermQuery(Term) it should be overridden and delegate to newPrefixQuery() method. Overall, I suggest you consider to use EdgeNGramTokenFilter in index time, and then

Re: Running Solr 4 on Sun vs OpenJDK JVM?

2013-07-23 Thread Otis Gospodnetic
Hi Cosimo, Very simple: Oracle 1.7 is your best bet. If you have a large heap and are seeing STW pauses, try G1 - we've been using it and have been happy with it. Ciao, Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jul

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-23 Thread Otis Gospodnetic
Hi, On Tue, Jul 23, 2013 at 8:02 AM, Erick Erickson erickerick...@gmail.com wrote: Neil: Here's a must-read blog about why allocating more memory to the JVM than Solr requires is a Bad Thing: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html It turns out that you

Re: softCommit doesn't work - ?

2013-07-23 Thread Erick Erickson
First a minor nit. The server.add(doc, time) is a hard commit, not a soft one. But the rest of it. When you add your 70 docs, do they all have the same id (i.e. the uniqueKey field). If so, there will be only one document, the last one since all the earlier ones will be overwritten. Not quite

Re: Question about field boost

2013-07-23 Thread Erick Erickson
this isn't doing what you think. title^10 content is actually parsed as text:title^100 text:content where text is my default search field. assuming title is a field. If you look a little farther up the debug output you'll see that. You probably want title:content^100 or some such? Erick On

solr - Deleting a row from the index, using the configuration files only.

2013-07-23 Thread Mysurf Mail
I am updating my solr index using deltaQuery and deltaImportQuery attributes in data-config.xml. In my condition I write where MyDoc.LastModificationTime '${dataimporter.last_index_time}' then after I add a row I trigger an update using data-config.xml. Now, sometimes I delete a row. How

Re: dataimporter, custom fields and parsing error

2013-07-23 Thread Andreas Owen
i have tried post.jar and it works when i set the literal.id in solrconfig.xml. i can't pass the id with post.jar (-Dparams=literal.id=abc) because i get a error: could not find or load main class .id=abc. On 20. Jul 2013, at 7:05 PM, Andreas Owen wrote: path was set text wasn't, but it

zkHost in solr.xml goes missing after SPLITSHARD using Collections API

2013-07-23 Thread Ali, Saqib
Hello all, Every time I issue a SPLITSHARD using Collections API, the zkHost attribute in the solr.xml goes missing. I have to manually edit the solr.xml to add zkHost after every SPLITSHARD. Any thoughts on what could be causing this? Thanks.

Start independent Zookeeper from within Solr install

2013-07-23 Thread Upayavira
Assumptions: * you currently have two choices to start Zookeeper: run it embedded within Solr, or download it from the ZooKeeper site and start it independently. * everything you need to run ZooKeeper (embedded or not) is included within the Solr distribution Assuming I've got the above

filter query result by user

2013-07-23 Thread Mysurf Mail
I want to restrict the returned results to be only the documents that were created by the user. I then load to the index the createdBy attribute and set it to index false,stored=true field name=CreatedBy type=string indexed=false stored=true required=true/ then in the I want to filter by

Re: filter query result by user

2013-07-23 Thread Raymond Wiker
Simple: the field needs to be indexed in order to search (or filter) on it. On Tue, Jul 23, 2013 at 3:26 PM, Mysurf Mail stammail...@gmail.com wrote: I want to restrict the returned results to be only the documents that were created by the user. I then load to the index the createdBy

Re: filter query result by user

2013-07-23 Thread Otis Gospodnetic
Moreover, you may want to use fq=CreatedBy:user1 for filtering. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jul 23, 2013 at 9:28 AM, Raymond Wiker rwi...@gmail.com wrote: Simple: the field needs to be indexed in order

Re: filter query result by user

2013-07-23 Thread Mysurf Mail
But I dont want it to be searched.on lets say the user name is giraffe I do want to filter to be where created by = giraffe but when the user searches his name, I will want only documents with name Giraffe. since it is indexed, wouldn't it return all rows created by him? Thanks. On Tue, Jul

Re:

2013-07-23 Thread Gary Young
Can anyone remove this spammer please? On Tue, Jul 23, 2013 at 4:47 AM, wired...@yahoo.com wrote: Hi! http://mackieprice.org/cbs.com.network.html

Re: filter query result by user

2013-07-23 Thread Mysurf Mail
I am probably using it wrong. http:// ...:8983/solr/vault10k/select?q=*%3A*defType=edismaxqf=CreatedBy%BLABLA returns all rows. It neglects my qf filter. Should I even use qf for filtrering with edismax? (It doesnt say that in the doc

Re: filter query result by user

2013-07-23 Thread Otis Gospodnetic
Hi, Use fq, not qf. It needs to be indexed. Filtering is like searching without scoring. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jul 23, 2013 at 9:39 AM, Mysurf Mail stammail...@gmail.com wrote: I am probably

Re: filter query result by user

2013-07-23 Thread Jack Krupansky
There is no such thing as a qf filter - qf is simply a list of names of fields to search for the terms from the query, q, as well as boost factors. Filtering is done with filter queries - fq. -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Tuesday, July 23, 2013 9:39 AM

Re: zkHost in solr.xml goes missing after SPLITSHARD using Collections API

2013-07-23 Thread Alan Woodward
Can you try upgrading to the just-released 4.4? Solr.xml persistence had all kinds of bugs in 4.3, which should have been fixed now. Alan Woodward www.flax.co.uk On 23 Jul 2013, at 13:36, Ali, Saqib wrote: Hello all, Every time I issue a SPLITSHARD using Collections API, the zkHost

Re: solr - Deleting a row from the index, using the configuration files only.

2013-07-23 Thread Alexandre Rafalovitch
Did you look at: *) $deleteDocById *) $deleteDocByQuery *) deletedPkQuery Just search for delete on https://wiki.apache.org/solr/DataImportHandler If you tried all of those, maybe you need to explain your problem in more specific details. Regards, Alex. Personal website:

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Jack Krupansky
One classic approach is to simply use the full text of the suspect text as well as bigrams and trigrams (phrases) from that text with OR operators. The top results will be the documents that most closely match the subject text. That provides a visual set similar results. You will then have to

Re: how number of indexed fields effect performance

2013-07-23 Thread Alexandre Rafalovitch
Do you need all of the fields loaded every time and are they stored? Maybe there is a document with gigantic content that you don't actually need but it gets deserialized anyway. Try lazy loading setting: enableLazyFieldLoading in solrconfig.xml Regards, Alex. Personal website:

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Shawn Heisey
On 7/23/2013 3:33 AM, Furkan KAMACI wrote: Sometimes a huge part of a document may exist in another document. As like in student plagiarism or quotation of a blog post at another blog post. Does Solr/Lucene or its libraries (UIMA, OpenNLP, etc.) has any class to detect it? Solr is designed

Collection not current after insert

2013-07-23 Thread Alistair Young
Hi there, My Solr is being fed by Fedora GSearch and when uploading a new resource, the Collection is optimized but not current so the new resource can't be found. I have to go to the Core Admin page and Optimize it from there, in order to make the collection current. Is there anything I

Re: deserializing highlighting json result

2013-07-23 Thread Jack Krupansky
The JSON keys within the highlighting object are the document IDs, and then the keys within those objects are the highlighted field names. Again, I repeat my question: Exactly why is it difficult to deserialize? Seems simple enough. -- Jack Krupansky -Original Message- From: Mysurf

Re: Appending *-wildcard suffix on all terms for querying: move logic from client to server side

2013-07-23 Thread Paul Blanchaert
Thanks Mikhail, I'll go for your EdgeNGramTokenFilter suggestion. - Kind regards, Paul

Re: Start independent Zookeeper from within Solr install

2013-07-23 Thread Timothy Potter
Curious what the use case is for this? Zookeeper is not an HTTP service so loading it in Jetty by itself doesn't really make sense. I also think this creates more work for the Solr team especially since setting up a production ensemble shouldn't take more than a few minutes once you have the nodes

Re: zkHost in solr.xml goes missing after SPLITSHARD using Collections API

2013-07-23 Thread Shawn Heisey
On 7/23/2013 7:50 AM, Alan Woodward wrote: Can you try upgrading to the just-released 4.4? Solr.xml persistence had all kinds of bugs in 4.3, which should have been fixed now. The 4.4.0 release has been finalized and uploaded, but the download link hasn't been changed yet because the mirror

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Tommaso Teofili
if you need a specialized algorithm for detecting blogposts plagiarism / quotations (which are different tasks IMHO) I think you have 2 options: 1. implement a dedicated one based on your features / metrics / domain 2. try to fine tune an existing algorithm that is flexible enough If I were to do

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Furkan KAMACI
Thanks for your comments. 2013/7/23 Tommaso Teofili tommaso.teof...@gmail.com if you need a specialized algorithm for detecting blogposts plagiarism / quotations (which are different tasks IMHO) I think you have 2 options: 1. implement a dedicated one based on your features / metrics / domain

WikipediaTokenizer for Removing Unnecesary Parts

2013-07-23 Thread Furkan KAMACI
Hi; I have indexed wikipedia data with Solr DIH. However when I look data that is indexed at Solr I something like that as well: {| style=text-align: left; width: 50%; table-layout: fixed; border=0 |- valign=top | style=width: 50%| :*[[Ubuntu]] :*[[Fedora]] :*[[Mandriva]] :*[[Linux Mint]]

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Shashi Kant
Here is a paper that I found useful: http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf On Tue, Jul 23, 2013 at 10:42 AM, Furkan KAMACI furkankam...@gmail.com wrote: Thanks for your comments. 2013/7/23 Tommaso Teofili tommaso.teof...@gmail.com if you need a specialized

Re: how number of indexed fields effect performance

2013-07-23 Thread Jack Krupansky
There was also a bug in the lazy loading of multivalued fields at one point recently in Solr 4.2 https://issues.apache.org/jira/browse/SOLR-4589 4.x + enableLazyFieldLoading + large multivalued fields + varying fl = pathological CPU load response time Do you use multivalued fields very

Re: WikipediaTokenizer for Removing Unnecesary Parts

2013-07-23 Thread Robert Muir
If you use wikipediatokenizer it will tag different wiki elements with different types (you can see it in the admin UI). so then followup with typetokenfilter to only filter the types you care about, and i think it will do what you want. On Tue, Jul 23, 2013 at 7:53 AM, Furkan KAMACI

Re: WikipediaTokenizer for Removing Unnecesary Parts

2013-07-23 Thread Jack Krupansky
Are you actually seeing that output from the WikipediaTokenizerFactory?? Really? Even if you use the Solr Admin UI analysis page? You should just see the text tokens plus the URLs for links. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Tuesday, July 23, 2013 10:53

Re: softCommit doesn't work - ?

2013-07-23 Thread tskom
Thanks for your comment Eric. When I use *server.add(doc);* - everything is fine (but takes long time to hard commit every single doc) , so I am sure docs are uniquely indexed. Maybe I shouldn't do *server.commit();* at all from solrj code, so SOLR would use autoCommit/autoSoftCommit

Re: XInclude and Document Entity not working on schema.xml

2013-07-23 Thread Elodie Sannier
Hello Chris, Thank you for your help. I checked differences between my files and your test files but I didn't find bugs in my files. All my files are in the same directory: collection1/conf = schema.xml content: ?xml version=1.0 encoding=UTF-8 ? !DOCTYPE schema [ !ENTITY

Re: WikipediaTokenizer for Removing Unnecesary Parts

2013-07-23 Thread Furkan KAMACI
Here is my fieldtype: fieldType name=text_tr class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WikipediaTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_tr.txt

Re:

2013-07-23 Thread Chris Hostetter
: Can anyone remove this spammer please? The recent influx is not confined to a single user, or a single list. Nor is there a clear course of action just yet, since the senders in question are all legitimate subscribers who have been active members of the community. There is an open issue

RE: spellcheck and search in a same solr request

2013-07-23 Thread Dyer, James
Solr doesn't support any kind of short-circuting the original query and returning the results of the corrected query or collation. You just re-issue the query in a second request. This would be a nice feature to add though. James Dyer Ingram Content Group (615) 213-4311 -Original

RE: Use same spell check dictionary across different collections

2013-07-23 Thread Dyer, James
DirectSolrSpellChecker does not prepare any kind of dictionary. It just uses the term dictionary from the indexed field. So what you are trying to do is impossible. You would think it would be possible with IndexBasedSpellChecker because it creates a dictionary as a sidecar lucene index.

[ANNOUNCE] Apache Solr 4.4 released

2013-07-23 Thread Steve Rowe
July 2013, Apache Solr™ 4.4 available The Lucene PMC is pleased to announce the release of Apache Solr 4.4 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search,

Re: Start independent Zookeeper from within Solr install

2013-07-23 Thread Upayavira
The use case is to prevent the necessity to download something else (zookeeper) when everything needed to run it is (likely) present in the Solr distribution already. Maybe we don't need to start Jetty, maybe we can start Zookeeper with an extra script in the Solr codebase. At present, if you

Re: custom field type plugin

2013-07-23 Thread Kevin Stone
What are the dangers of trying to use a range of 10 billion? Simply a slower index time? Or will I get inaccurate results? I have tried it on a very small sample of documents, and it seemed to work. I could spend some time this week trying to get a more robust (and accurate) dataset loaded to play

Re: XInclude and Document Entity not working on schema.xml

2013-07-23 Thread Chris Hostetter
Elodie: I just tested your configs (as close as i could get since i don't have the com.kelkoo classes) using the current HEAD of the 4x branch and had no problems with the entity includes. what java version/vendor are you using? are you using the provided jetty or your own servlet

Re: Collection not current after insert

2013-07-23 Thread Michael Della Bitta
Hi Alistair, You probably need a commit, and not an optimize. Which version of Solr are you running against? The 4.0 releases have more complications, but generally sending a commit will do. Not sure if GSearch sends one, only partly because I never was able to make it work. :) Michael Della

Re: custom field type plugin

2013-07-23 Thread David Smiley (@MITRE.org)
Oh cool! I'm glad it at least seemed to work. Can you post your configuration of the field type and report from Solr's logs what the maxLevels is used for this field, which is logged the first time you use the field type? Maybe there isn't a limit under 10B after all. Some quick'n'dirty

Re: Calculating Solr document score by ignoring the boost field.

2013-07-23 Thread Chris Hostetter
: Ok thanks, I just wanted the know is it possible to ignore boost value or : not during score calculation and as you said its not. : Now I would have to focus on nutch to fix the issue and not to send boost=0 : to Solr. the index time bosts are encoded in field norms -- if you wnat to ignore

Re:

2013-07-23 Thread Gora Mohanty
On 23 July 2013 21:52, Chris Hostetter hossman_luc...@fucit.org wrote: : Can anyone remove this spammer please? The recent influx is not confined to a single user, or a single list. Nor is there a clear course of action just yet, since the senders in question are all legitimate subscribers

Re: Question about field boost

2013-07-23 Thread Joe Zhang
I'm not sure I understand, Erick. I don't have a text field in my schema; title and content are both legal fields. On Tue, Jul 23, 2013 at 5:15 AM, Erick Erickson erickerick...@gmail.comwrote: this isn't doing what you think. title^10 content is actually parsed as text:title^100

Spellcheck field element and collation issues

2013-07-23 Thread Brendan Grainger
Hi All, I have an IndexBasedSpellChecker component configured as follows (note the field parameter is set to the spellcheck field): searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_spell/str lst name=spellchecker str

Fw:

2013-07-23 Thread wiredkel
Hi! http://optiideas.com/google.com.offers.html

RE: Spellcheck field element and collation issues

2013-07-23 Thread Dyer, James
For this query: http://localhost:8981/solr/articles/select?indent=trueq=Perfrm%20HVCrows=0 ...do you get anything back in the spellcheck response? Is it correcting the individual words and not giving collations? Or are you getting no individual word suggestions also? James Dyer Ingram

how number of indexed fields effect performance

2013-07-23 Thread Suryansh Purwar
Hi, Thanks for your suggestions. I'll be able to provide answers to a few of your questions right now rest I'll answer after some time. It takes around 150k to 200k queries before it goes down again after restarting it. In a typical query we are returning around 20 fields. Memory utilization

Re: Spellcheck field element and collation issues

2013-07-23 Thread Brendan Grainger
Hi James, I get the following response for that query: response lst name=responseHeader int name=status0/int int name=QTime8/int lst name=params str name=indenttrue/str str name=qPerfrm HVC/str str name=rows0/str /lst /lst result name=response numFound=0 start=0/result lst name=spellcheck lst

Re: Node down, but not out

2013-07-23 Thread jimtronic
I think the best bet here would be a ping like handler that would simply return the state of only this box in the cluster: Something like /admin/state which would return down,active,leader,recovering I'm not really sure where to begin however. Any ideas? jim On Mon, Jul 22, 2013 at 12:52 PM,

Re: Spellcheck field element and collation issues

2013-07-23 Thread Brendan Grainger
Hi James, If I try: http://localhost:8981/solr/articles/select?indent=trueq=Perfrm%20HVCrows=0maxCollationTries=0 I get the same result: response lst name=responseHeader int name=status0/int int name=QTime7/int lst name=params str name=indenttrue/str str name=qPerfrm HVC/str str

RE: Spellcheck field element and collation issues

2013-07-23 Thread Dyer, James
Try tacking maxCollationTries=0 to the URL and see if the collation returns. If you get a collation, then try the same URL with the collation as the q parameter. Does that get results? My suspicion here is that you are assuming that markup_texts is the default search field for /select but in

socket write error Solrj 4.3.1

2013-07-23 Thread franagan
Hi all, im testing solrcloud (version 4.3.1) with 2 shards and 1 external zookeeper. All its runing ok, documents are indexing in 2 diferent shards and select *:* give me all documents. Now im trying to add/index a new document via solj ussing CloudSolrServer. *the code:*

Re: zkHost in solr.xml goes missing after SPLITSHARD using Collections API

2013-07-23 Thread Ali, Saqib
Thanks Alan and Shawn. Just installed Solr 4.4, and no longer experiencing the issue. Thanks! :) On Tue, Jul 23, 2013 at 7:21 AM, Shawn Heisey s...@elyograg.org wrote: On 7/23/2013 7:50 AM, Alan Woodward wrote: Can you try upgrading to the just-released 4.4? Solr.xml persistence had all

Processing a lot of results in Solr

2013-07-23 Thread Matt Lieber
Hello Solr users, Question regarding processing a lot of docs returned from a query; I potentially have millions of documents returned back from a query. What is the common design to deal with this ? 2 ideas I have are: - create a client service that is multithreaded to handled this - Use the

RE: Spellcheck field element and collation issues

2013-07-23 Thread Dyer, James
I don't believe you can specify more than 1 field on df (default field). What you want, I think, is qf (query fields), which is available only if using dismax/edismax. http://wiki.apache.org/solr/SearchHandler#df http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields.29 James Dyer

Re: socket write error Solrj 4.3.1

2013-07-23 Thread franagan
For people who have same issue, solved solved adding: str name=fmap.contenttext/str in the requestHandler /update/extract in solrconfig.xml: requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str

Re: Spellcheck field element and collation issues

2013-07-23 Thread Brendan Grainger
Thanks James. That's it! Now: http://localhost:8981/solr/articles/select?indent=trueq=Perfrm%20HVCrows=0maxCollationTries=0 returns: lst name=collation str name=collationQueryperform hvac/str int name=hits4/int lst name=misspellingsAndCorrections str name=perfrmperform/str str name=hvchvac/str

maximum number of documents per shard?

2013-07-23 Thread Ali, Saqib
still 2.1 billion documents?

RE: Spellcheck field element and collation issues

2013-07-23 Thread Dyer, James
You've got it. The only other thing is that spellcheck.q does not analyze anything. The whole purpose of this is to allow you to just send raw keywords to be spellchecked. This is handy if you have a complex q parameter (say, you're using local params, etc) and the SpellingQueryConverter

How to make soft commit more reliable?

2013-07-23 Thread SolrLover
Currently I am using SOLR 3.5.X and I push updates to SOLR via queue (Active MQ) and perform hard commit every 30 minutes (since my index is relatively big around 30 million documents). I am thinking of using soft commit to implement NRT search but I am worried about the reliability. For ex: If I

Re: Spellcheck field element and collation issues

2013-07-23 Thread Brendan Grainger
Perfect thanks so much. You just cleared up the other little bit, i.e. when the SpellingQueryConverter is used/not used and why you might implement your own. Thanks again. On Tue, Jul 23, 2013 at 6:48 PM, Dyer, James james.d...@ingramcontent.comwrote: You've got it. The only other thing is

Re: maximum number of documents per shard?

2013-07-23 Thread Jack Krupansky
2.1 billion documents (including deleted documents) per Lucene index, but essentially per Solr shard as well. But don’t even think about going that high. In fact, don't plan on going above 100 million unless you do a proof of concept that validates that you get acceptable query and update

Re: problems about solr replication in 4.3

2013-07-23 Thread Erick Erickson
Are you mixing SolrCloud and old-style master/slave? There was a bug a while ago (search the JIRA) where replication was copying the entire index unnecessarily, but I think that was fixed by 4.3. Best Erick On Tue, Jul 23, 2013 at 6:33 AM, xiaoqi belivexia...@gmail.com wrote: hi,all i have

Re: softCommit doesn't work - ?

2013-07-23 Thread Erick Erickson
Right, issuing a commit after every document is not good practice. Relying on the auto commit parameters in solrconfig.xml is usually best, although I will sometimes issue a commit at the very end of the indexing run. Several things about this thread aren't making sense. First of all your

Re: Question about field boost

2013-07-23 Thread Erick Erickson
Bah! I didn't notice that you'd used edismax, ignore my comments. Sorry for the confusion Erick On Tue, Jul 23, 2013 at 2:34 PM, Joe Zhang smartag...@gmail.com wrote: I'm not sure I understand, Erick. I don't have a text field in my schema; title and content are both legal fields. On Tue,

Fw:

2013-07-23 Thread wiredkel
Hi! http://millanao.cl/google.com.offers.html

Re: Processing a lot of results in Solr

2013-07-23 Thread Timothy Potter
Hi Matt, This feature is commonly known as deep paging and Lucene and Solr have issues with it ... take a look at http://solr.pl/en/2011/07/18/deep-paging-problem/ as a potential starting point using filters to bucketize a result set into sets of sub result sets. Cheers, Tim On Tue, Jul 23,

Re: custom field type plugin

2013-07-23 Thread Kevin Stone
Sorry for the late response. I needed to find the time to load a lot of extra data (closer to what we're anticipating). I have an index with close to 220,000 documents, each with at least two coordinate regions anywhere between -10 billion to +10 billion, but could potentially have up to maybe

Re: Processing a lot of results in Solr

2013-07-23 Thread Roman Chyla
Hello Matt, You can consider writing a batch processing handler, which receives a query and instead of sending results back, it writes them into a file which is then available for streaming (it has its own UUID). I am dumping many GBs of data from solr in few minutes - your query + streaming

Re: Processing a lot of results in Solr

2013-07-23 Thread Matt Lieber
That sounds like a satisfactory solution for the time being - I am assuming you dump the data from Solr in a csv format? How did you implement the streaming processor ? (what tool did you use for this? Not familiar with that) You say it takes a few minutes only to dump the data - how long does it

Re: custom field type plugin

2013-07-23 Thread Smiley, David W.
Kevin, Those are some good query response times but they could be better. You've configured the field type sub-optimally. Look again at http://wiki.apache.org/solr/SpatialForTimeDurations and note in particular maxDistErr. You've left it at the value that comes pre-configured with Solr,

  1   2   >