Re: Collection - loadOnStartup

2013-08-06 Thread Srivatsan
Then if so, how to set loadOnStartup for collectionsAPI in solr4.4 ??? -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-loadOnStartup-tp4082531p4082731.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-06 Thread Raymond Wiker
Ok, let me rephrase that slightly: does your database extraction include BLOBs or CLOBs that are actually complete documents, that might be UTF-8 encoded text? From the stack trace in your second post, it seems that the error occurs while parsing an XML file uploaded via the UpdateRequestHandler.

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-06 Thread Federico Chiacchiaretta
2013/8/6 Raymond Wiker rwi...@gmail.com Ok, let me rephrase that slightly: does your database extraction include BLOBs or CLOBs that are actually complete documents, that might be UTF-8 encoded text? It definitely does, each entry I have in PostgreSQL has a field of type text that include

Re: Boosting in function queries?

2013-08-06 Thread Upayavira
Try: str name=q _query_:{!dismax qf=Fname^8.0 v=$f_name} OR _query_:{!dismax qf=Lname^8.0 v=$l_name} /str If you are using one of the later 4.x releases, you might find you can do away with the _query_: str name=q {!dismax qf=Fname^8.0 v=$f_name} OR {!dismax qf=Lname^8.0 v=$l_name} /str I

Re: Encountered invalid class name

2013-08-06 Thread Artem Karpenko
I'm not JBoss expert, but I'm pretty sure it should work fine. The validator throws warnings, that's true. But it looks like those warnings do not influence process of loading of the classes. I suggest you to have a look at this diff

Re: Encountered invalid class name

2013-08-06 Thread anpm1989
It's right, i have the same idea with you after checking ServiceLoaderProcessor, it just warning and the className is added to the list. Thank very much, Artem Best wishes to you, An Pham Minh -- View this message in context:

Re: solr - using fq parameter does not retrieve an answer

2013-08-06 Thread Mysurf Mail
Thanks. On Mon, Aug 5, 2013 at 4:57 PM, Shawn Heisey s...@elyograg.org wrote: On 8/5/2013 2:35 AM, Mysurf Mail wrote: When I query using http://localhost:8983/solr/vault/select?q=*:* I get reuslts including the following doc ... ... int name=VersionNumber7/int ...

Knowing what field caused the retrival of the document

2013-08-06 Thread Mysurf Mail
I have two indexed fields in my document.- Name, Comment. The user searches for a phrase and I need to act differently if it appeared in the comment or the name. Is there a way to know why the document was retrieved? Thanks.

Re: Knowing what field caused the retrival of the document

2013-08-06 Thread Raymond Wiker
If you were searching for single words (terms), you could use the 'tf' function, by adding something like matchesinname:tf(name, whatever) to the 'fl' parameter - if the 'name' field contains whatever, the (result) field 'matchesinname' will be 1. On Tue, Aug 6, 2013 at 10:24 AM, Mysurf Mail

How to plan field boosting

2013-08-06 Thread Mysurf Mail
I query using qf=Name+Tag Now I want that documents that have the phrase in tag will arrive first so I use qf=Name+Tag^2 and they do appear first. What should be the rule of thumb regarding the number that comes after the field? How do I know what number to set it?

Re: Transform data at index time: country - continent

2013-08-06 Thread Christian Köhler - ZFMK
Am 05.08.2013 15:52, schrieb Jack Krupansky: You can write a brute force JavaScript script using the StatelessScript update processor that hard-codes the mapping. I'll probably do something like this. Unfortunately I have no influence on the original db itself, so I have fix this in solr.

Solr MaxCollections

2013-08-06 Thread Srivatsan
Hi, I am using solr4.3 for my search application with apache zookeeper 3.4.5 . I came across limit of znode size of zookeeper. Default is 1 MB rite? I have read one article that size of znode reaches 1MB with just 1000 collections. Is it so? . And is it preferable to increase the znode size to

Re: Knowing what field caused the retrival of the document

2013-08-06 Thread Mysurf Mail
But what if this for multiple words ? I am guessing solr knows why the document is there since I get to see the paragraph in the highlight.(hl) section. On Tue, Aug 6, 2013 at 11:36 AM, Raymond Wiker rwi...@gmail.com wrote: If you were searching for single words (terms), you could use the 'tf'

Re: Transform data at index time: country - continent

2013-08-06 Thread Raymond Wiker
Another option might be to use a pre-existing web service... it should be relatively easy to add that to your dataimporthandler configuration (if you're using DIH, that is :-) A quick google search gave me http://www.geonames.org; see http://www.geonames.org/export/ for API information. On Tue,

Re: Transform data at index time: country - continent

2013-08-06 Thread Christian Köhler - ZFMK
Hi, Am 06.08.2013 12:56, schrieb Raymond Wiker: Another option might be to use a pre-existing web service... it should be relatively easy to add that to your dataimporthandler configuration (if you're using DIH, that is :-) A quick google search gave me http://www.geonames.org; see

Re: Customize Velocity Output, Utility Class or Custom Tool

2013-08-06 Thread Erick Erickson
_Everyone_ is qualified to submit a patch, it just takes some additional karma to be able to commit it to the code line. So please do create and attach any patch you'd like to a JIRA! Best Erick On Mon, Aug 5, 2013 at 4:39 PM, O. Olson olson_...@yahoo.it wrote: Thank you very much *Erik*.

Re: Collection - loadOnStartup

2013-08-06 Thread Erick Erickson
I don't think you can, really. Collections, at this point, is more geared towards SolrCloud. The idea of lazy loading matched with SolrCloud makes my head hurt. I'm afraid for the nonce you'll have to individually edit the solr.xml or core.properties files on the nodes once the collections are

Re: Unexpected behavior when sorting groups

2013-08-06 Thread Paul Masurel
On Mon, Aug 5, 2013 at 2:42 AM, Tony Paloma to...@valvesoftware.com wrote: Thanks Paul. That's helpful. I'm not familiar with the concept of custom caches. Would this be custom Java code or something defined in the config/schema? Can you point me to some documentation? My solution requires

Adding Postgres and Mysql JDBC drivers to Solr

2013-08-06 Thread Spadez
Hi, I am running Solr4 on Jetty9 and I am trying to include the JDBC drivers for both MySQL and PostgreSQL. I'm a little confused about how I do this. I beleive these to be the two files I need: http://cdn.mysql.com/Downloads/Connector-J/mysql-connector-java-5.1.26.tar.gz

Re: Measuring SOLR performance

2013-08-06 Thread Dmitry Kan
Hi Roman, With fresh checkout, the reported admin_endpoint is: http://localhost:8983/solr/admin. This url redirects to http://localhost:8983/solr/#/ . I'm using solr 4.3.1. Is your tool supporting this version? Of three URLs you asked for, only the 3rd one gave response:

Re: Measuring SOLR performance

2013-08-06 Thread Shawn Heisey
On 8/6/2013 6:17 AM, Dmitry Kan wrote: Of three URLs you asked for, only the 3rd one gave response: snip The rest report 404. On Mon, Aug 5, 2013 at 8:38 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, So I think the admin pages are different on your version of solr, what do you

Re: Adding Postgres and Mysql JDBC drivers to Solr

2013-08-06 Thread Shawn Heisey
On 8/6/2013 6:15 AM, Spadez wrote: Hi, I am running Solr4 on Jetty9 and I am trying to include the JDBC drivers for both MySQL and PostgreSQL. I'm a little confused about how I do this. I beleive these to be the two files I need:

Re: Measuring SOLR performance

2013-08-06 Thread Dmitry Kan
Hi, Thanks for the clarification, Shawn! So with this in mind, the following work: http://localhost:8983/solr/statements/admin/system?wt=json http://localhost:8983/solr/statements/admin/mbeans?wt=json not copying their output to save space. Roman: is this something that should be set via -t

Multiple sorting does not work as expected

2013-08-06 Thread Mysurf Mail
My documents has 2 indexed attribute - name (string) and version (number) I want within the same score the documents will be displayed by the following order score(desc),name(desc),version(desc) Therefor I query using : http://localhost:8983/solr/vault/select? q=BOMfl=*:score

Spellchecker suggests Tokens

2013-08-06 Thread Snubbel
Hello, I have a problem getting stated with SolrDirectSpellChecker. I use NGramFilterFactory to index and query for strings of length greater than 3. So, if I index the word aQuiteLongWord I can search for long and get the result. Now I'm adding the DirectSolrSpellChecker. And when searching for

Re: Multiple sorting does not work as expected

2013-08-06 Thread Mysurf Mail
my schema field name=Name type=text_en indexed=true stored=true required=true/ field name=Version type=int indexed=true stored=true required=true/ On Tue, Aug 6, 2013 at 5:06 PM, Mysurf Mail stammail...@gmail.com wrote: My documents has 2 indexed attribute - name (string) and

Re: Solr MaxCollections

2013-08-06 Thread Jack Krupansky
Although there is no hard limit or published guidelines, I would say that you should try to limit your number of collections per cluster to dozens or no more than 100. More than that and you are in uncharted territory. If it works for you, fine, but if it doesn't please don’t complain. But...

Help importing xml file as raw xml

2013-08-06 Thread jimtronic
Hi, I found a few threads out there dealing with this problem, but there didn't really seem to be much detail to the solution. I have large xml files (500M to 2+ G) with a complex nested structure. It's impossible for me to import the exact structure into a solr representation, and, honestly, I

Re: Multiple sorting does not work as expected

2013-08-06 Thread Jack Krupansky
The Name field is sorted as you have requested - desc. I suspect that you wanted name to be sorted asc (natural order.) -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Tuesday, August 06, 2013 10:22 AM To: solr-user@lucene.apache.org Subject: Re: Multiple sorting does

Solr 4.4 and Google Protobuf

2013-08-06 Thread Guido Medina
Hi, I saw inside the solr.war file there is a protobuf version 2.4.0a, I have two questions about it: 1. Where does Solr uses protobuf? And is it better than HTTP? 2. Why is it such an old version if protobuf recommended versions are 2.4.1 and 2.5.0 - 2.5.0 has an extra 25% performance

Re: Knowing what field caused the retrival of the document

2013-08-06 Thread Jack Krupansky
Add the debugQuery=true parameter and the explainsection will detail exactly what terms matched for each document. You could also use the Solr term sectors component to get info on what terms occur where in a document, but that adds more overhead to the index for stored term vectors. --

Re: How to plan field boosting

2013-08-06 Thread Jack Krupansky
Mostly guessing and trial and error - and eventually experience - unless you are able to do tf-idf similarity math in your head! You can look at the explain section of the output of the debugQuery=true parameter and work through the math yourself as well. Look at the final scores of documents

'Optimizing' Solr Index Size

2013-08-06 Thread Brendan Grainger
Hi All, First of all, what I was actually trying to do is actually get a little space back. So if there is a better way to do this by adjusting the MergePolicy or something else please let me know. My index is currently 200Gb. In the past (Solr 1.4) we've found that optimizing the index will

Schema Lint

2013-08-06 Thread Steven Bower
Is there an easy way in code / command line to lint a solr config (or even just a solr schema)? Steve

Re: Adding Postgres and Mysql JDBC drivers to Solr

2013-08-06 Thread Spadez
Thank you very much -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-Postgres-and-Mysql-JDBC-drivers-to-Solr-tp4082806p4082832.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple sorting does not work as expected

2013-08-06 Thread Mysurf Mail
I don't see how it is sorted. this is the order as displayed above 1- BOM Total test2 2- BOM Total test - Copy 3- BOM Total test2 all in the same 2.2388418 score On Tue, Aug 6, 2013 at 5:28 PM, Jack Krupansky j...@basetechnology.comwrote: The Name field is sorted as you have requested -

Re: Solr 4.4 and Google Protobuf

2013-08-06 Thread Shawn Heisey
On 8/6/2013 8:37 AM, Guido Medina wrote: I saw inside the solr.war file there is a protobuf version 2.4.0a, I have two questions about it: 1. Where does Solr uses protobuf? And is it better than HTTP? 2. Why is it such an old version if protobuf recommended versions are 2.4.1 and 2.5.0 -

Re: 'Optimizing' Solr Index Size

2013-08-06 Thread Brendan Grainger
Well, I guess I can answer one of my questions which I didn't exactly explicitly state, which is: how do I force solr to merge segments to a given maximum. I forgot about doing this: curl ' http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false ' which reduced the number of

Re: 'Optimizing' Solr Index Size

2013-08-06 Thread Brendan Grainger
To maybe answer another one of my questions about the 50Gb recovered when running: curl ' http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false ' It looks to me that it was from deleted docs being completely removed from the index. Thanks On Tue, Aug 6, 2013 at 11:45

TermRangeTermsEnum usage and performance

2013-08-06 Thread Chet Vora
Hi I have an index consisting of a double value that can range between certain values and an associated tag. I am trying to find all the docs which match a certain tag (or combination of tags) and a certain range. I'm trying to use the TermRangeTermsEnum from the Flex API as part of a custom

Re: Knowing what field caused the retrival of the document

2013-08-06 Thread Jeff Wartes
For what it's worth, I had the same question last year, and I never really got a good solution: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3C81 e9a7879c550b42a767f0b86b2b81591a15b...@ex4.corp.w3data.com%3E I dug into the highlight component for a while, but it turned

Re: Transform data at index time: country - continent

2013-08-06 Thread Walter Underwood
Would synonyms help? If you generate the query terms for the continents, you could do something like this: usa = continent-na canada = continent-na germany = continent-europe und so weiter. wunder On Aug 6, 2013, at 2:18 AM, Christian Köhler - ZFMK wrote: Am 05.08.2013 15:52, schrieb Jack

Re: Knowing what field caused the retrival of the document

2013-08-06 Thread Raymond Wiker
One option might be to run two queries with fq set to +name:whatever phrase and +comment:whatever phrase. The query results may then be annotated and merged (assuming that the hit scores only depend on the main query and the document content - i.e, no normalization, and no score contribution

Re: Suggest aka autocomplete request handler with solr 4.4

2013-08-06 Thread Utkarsh Sengar
Jack/Chris, 1. This is my complete schema.xml: https://gist.github.com/utkarsh2012/6167128/raw/1d5ac6520b666435cd040b5cc6dcb434cdfd7925/schema.xml More specifically, allText is of type: text_general which has a LowerCaseFatcory during index time. 2. allText has values:

SolrCloud Indexing question

2013-08-06 Thread Kalyan Kuram
Hi AllI need suggestion on how to send indexing commands to 2 different solr server,Basically i want to mirror my index,here is the scenarioi have 2 cluster, each cluster has one master and 2 slaves with external zookeeper in the fronti need suggestion on what solr api class i should use to send

Problems with distributed MoreLikeThis

2013-08-06 Thread Shawn Heisey
I'm having some problems with distributed MLT. On 4.4, it seems completely broken. Searches that work on 4.2.1 return an exception on 4.4.0. This stackoverflow post shows the EarlyTerminatingCollectorException I'm getting:

Re: SolrCloud Indexing question

2013-08-06 Thread Shawn Heisey
On 8/6/2013 12:55 PM, Kalyan Kuram wrote: Hi AllI need suggestion on how to send indexing commands to 2 different solr server,Basically i want to mirror my index,here is the scenarioi have 2 cluster, each cluster has one master and 2 slaves with external zookeeper in the fronti need suggestion

problems running solr 4.4 with HDFS HA

2013-08-06 Thread Greg Walters
Good day, I've been working to test Solr 4.4 in our dev environment with the HDFS integration that was just announced and am having some issues getting NameNode HA to work. To start off with I had to change out all of the Hadoop jars in WEB-INF/lib/ with the matching jars from our Hadoop

RE: external zookeeper with SolrCloud

2013-08-06 Thread Joshi, Shital
Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances. With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin

Re: external zookeeper with SolrCloud

2013-08-06 Thread Erick Erickson
First off, even 6 ZK instances are overkill, vast overkill. 3 should be more than enough. That aside, however, how are you letting your Solr nodes know about the zk machines? Is it possible you've pointed some of your Solr nodes at specific ZK machines that aren't up when you have this problem?

Re: Schema Lint

2013-08-06 Thread Andy Lester
On Aug 6, 2013, at 9:55 AM, Steven Bower smb-apa...@alcyon.net wrote: Is there an easy way in code / command line to lint a solr config (or even just a solr schema)? No, there's not. I would love there to be one, especially for the DIH. -- Andy Lester = a...@petdance.com = www.petdance.com

RE: external zookeeper with SolrCloud

2013-08-06 Thread Joshi, Shital
Machines are definitely up. Solr4 node and zookeeper instance share the machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know about the zk instances. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 06, 2013 5:03 PM To:

Re: problems running solr 4.4 with HDFS HA

2013-08-06 Thread Mark Miller
On Aug 6, 2013, at 3:15 PM, Greg Walters gwalt...@sherpaanalytics.com wrote: -Dsolr.hdfs.confdir=/etc/hadoop/conf.cloudera.hdfs1 Have you set that up in the directoryFactory section of solrconfig.xml? Make sure you have something like: directoryFactory name=DirectoryFactory

Re: Unexpected behavior when sorting groups

2013-08-06 Thread Paul Masurel
Here is some detail about how grouping is implemented in Solr. http://fulmicoton.com/posts/grouping-in-solr/ On Mon, Aug 5, 2013 at 2:42 AM, Tony Paloma to...@valvesoftware.com wrote: Thanks Paul. That's helpful. I'm not familiar with the concept of custom caches. Would this be custom Java

Re: Schema Lint

2013-08-06 Thread Alexandre Rafalovitch
Funny, you should ask. Here are the relevant suggestions from the Solr Usability contest that is going right now: *) https://solrstart.uservoice.com/forums/216001-usability-contest/suggestions/4249791-solr-lint-a-tool-to-check-solr-configuration-and *)

Re: Transform data at index time: country - continent

2013-08-06 Thread Jack Krupansky
I've implemented a JavaScript script for the StatelessScriptUpdate processor that does country code to continent code mapping. It will appear in the next early access of my Solr 4.x Deep Dive book (on 8/16.) One interesting issue: These countries that span continents - Turkey and Russia and

entity classification solr

2013-08-06 Thread smanad
I have the following situation when using Solr 4.3. My document contains entities for example peanut butter. I have a list of such entities. These are items that go together and are not to be treated as two individual words. During indexing, I want solr to realize this and treat peanut butter as

Solr design. Choose Cores or Shards?

2013-08-06 Thread manju16832003
Hi, I have a confusion over choosing Cores or Shards for the project scenario. My scenario is as follows I have three entities 1. Customers 2. Product Info 3. Listings [Contains all the listings posted by customer based on product] I'm planning to design Solr structure for the above scenario

Re: entity classification solr

2013-08-06 Thread manju16832003
Can you provide sample structure of the document with entities, how does the document look like?. As far as I can assume, you do not need to apply any filters. If you are entities are searchable include them in the fulltext or keyword research. Is your entities are part of the document and are

Re: Problems with distributed MoreLikeThis

2013-08-06 Thread manju16832003
I'm not sure about the root cause in your case. However one thing to remember while MLT is that, *MLT does not work with integer fields*. In your case if 'catchall' is copyField and if you are trying to copy any integer values verify it again :-). Thanks -- View this message in context:

Re: Measuring SOLR performance

2013-08-06 Thread Roman Chyla
Hi Dmitry, I've modified the solrjmeter to retrieve data from under the core (the -t parameter) and the rest from the /solr/admin - I could test it only against 4.0, but it is there the same as 4.3 - it seems...so you can try the fresh checkout my test was: python solrjmeter.py -a -x

Re: Solr list all records but fq matching records first

2013-08-06 Thread Thyagaraj
I verified, the code is proper, I just highlighted with bold few things. Below I have pasted it again, Method 1 SolrQuery query = new SolrQuery().setStart(first).setRows( searchCommand.getRowsPerPage()); //setting query query.setQuery(*); //setting

Re: Solr MaxCollections

2013-08-06 Thread Srivatsan
Thanks Jack -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-MaxCollections-tp4082772p4082937.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Collection - loadOnStartup

2013-08-06 Thread Srivatsan
Thanks Erick -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-loadOnStartup-tp4082531p4082938.html Sent from the Solr - User mailing list archive at Nabble.com.