Re: Trouble handling Unit symbol

2012-04-13 Thread Rajani Maski
Hi All, I tried to index with UTF-8 encode but the issue is still not fixed. Please see my inputs below. *Indexed XML:* ?xml version=1.0 encoding=UTF-8 ? add doc field name=ID0.100/field field name=BODYµ/field /doc /add *Search Query - * BODY:µ numfound : 0 results

How to read SOLR cache statistics?

2012-04-13 Thread Kashif Khan
Does anyone explain what does the following parameters mean in SOLR cache statistics? *name*: queryResultCache *class*: org.apache.solr.search.LRUCache *version*: 1.0 *description*: LRU Cache(maxSize=512, initialSize=512) *stats*: lookups : 98 *hits *: 59 *hitratio *: 0.60

Re: Lexical analysis tools for German language data

2012-04-13 Thread Tomas Zerolo
On Thu, Apr 12, 2012 at 03:46:56PM +, Michael Ludwig wrote: Von: Walter Underwood German noun decompounding is a little more complicated than it might seem. There can be transformations or inflections, like the s in Weinachtsbaum (Weinachten/Baum). I remember from my

Re: Solr Scoring

2012-04-13 Thread Li Li
another way is to use payload http://wiki.apache.org/solr/Payloads the advantage of payload is that you only need one field and can make frq file smaller than use two fields. but the disadvantage is payload is stored in prx file, so I am not sure which one is fast. maybe you can try them both. On

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-13 Thread Mikhail Khludnev
Did I get right that you have two separate processes (different app) access the same LuceneDIrectory simultaneously? In this case I suggest to read about Locking mechanism. I'm not really experienced in it. You showed logs from StrUpdHandler failure, it's clear. Can you show logs from Embeded

Re: How to read SOLR cache statistics?

2012-04-13 Thread Li Li
http://wiki.apache.org/solr/SolrCaching On Fri, Apr 13, 2012 at 2:30 PM, Kashif Khan uplink2...@gmail.com wrote: Does anyone explain what does the following parameters mean in SOLR cache statistics? *name*: queryResultCache *class*: org.apache.solr.search.LRUCache *version*: 1.0

AW: Lexical analysis tools for German language data

2012-04-13 Thread Michael Ludwig
Von: Tomas Zerolo There can be transformations or inflections, like the s in Weinachtsbaum (Weinachten/Baum). I remember from my linguistics studies that the terminus technicus for these is Fugenmorphem (interstitial or joint morpheme) [...] IANAL (I am not a linguist -- pun

Re: two structures in solr

2012-04-13 Thread tkoomzaaskz
Thank you very much Erick for your reply! So should it go something like the following: http://lucene.472066.n3.nabble.com/file/n3907393/solr_index.png sorry for an ugly drawing ;) In this example, the index will have 13 columns: 6 for project, 6 for contractor and one to define the type. Is

Re: Boost differences in two environments for same query and config

2012-04-13 Thread Kerwin
Hi Erick, Thanks for your suggestions. I did an optimize on the remote installation and this time with the same number of documents but still face the same issue as seen from the debug output below: 9.950362E-4 = (MATCH) sum of: 9.950362E-4 = (MATCH) weight(RECORD_TYPE:info in 35916),

Re: Solr Scoring

2012-04-13 Thread Kissue Kissue
Thanks a lot. I had already implemented Walter's solution and was wondering if this was the right way to deal with it. This has now given me the confidence to go with the solution. Many thanks. On Fri, Apr 13, 2012 at 1:04 AM, Erick Erickson erickerick...@gmail.comwrote: GAH! I had my head in

Re: Facets involving multiple fields

2012-04-13 Thread Marc SCHNEIDER
Hi, Thanks for your answer. Yes it works in this case when I know the facet name (Computer). What if I want to automatically compute all facets? facet.query=keyword:* short_title:* doesn't work, right? Marc. On Thu, Apr 12, 2012 at 2:08 PM, Erick Erickson erickerick...@gmail.com wrote:

Re: How to read SOLR cache statistics?

2012-04-13 Thread Kashif Khan
Hi Li Li, I have been through that WIKI before but that does not explain what is *evictions*, *inserts*, *cumulative_inserts*, *cumulative_evictions*, *hitratio *and all. These terms are foreign to me. What does the following line mean? *item_ABC :

Issues with language based indexing

2012-04-13 Thread JGar
Hello, I am new to Solr. it is resulting some docs in my search for Acciones y Valores string. When i go and search for the same word in the given doc manually, i could not find those word. Pls help on what basis the doc is found in the search . Thanks -- View this message in context:

Realtime /get versus SearchHandler

2012-04-13 Thread Benson Margulies
A discussion over on the dev list led me to expect that the by-if field retrievals in a SolrCloud query would come through the get handler. In fact, I've seen them turn up in my search component in the search handler that is configured with my custom QT. (I have a 'prepare' method that sets

Re: Trouble handling Unit symbol

2012-04-13 Thread Erick Erickson
Please review: http://wiki.apache.org/solr/UsingMailingLists Especially the bit about adding debugQuery=on and showing the results. You're asking people to guess at solutions without providing much in the way of context. You might try looking at your index with Luke to see what's actually in

Re: two structures in solr

2012-04-13 Thread Erick Erickson
bq: Is that right? I don't know, does it work G? You'll probably want an additional field for unique id (just named id in the example) that should be disjoint between your types. Best Erick On Fri, Apr 13, 2012 at 3:41 AM, tkoomzaaskz tomasz.du...@gmail.com wrote: Thank you very much Erick for

Re: Boost differences in two environments for same query and config

2012-04-13 Thread Erick Erickson
Well, next thing I'd do is just copy your entire solr home directory to the remote machine and try that. If that gives identical results on both, then try moving just your solr home/data directory to the remote machine. I suspect that you've done something different between the two machines

Re: Trouble handling Unit symbol

2012-04-13 Thread Rajani Maski
Fine. Thank you. I will look at it. On Fri, Apr 13, 2012 at 5:21 PM, Erick Erickson erickerick...@gmail.comwrote: Please review: http://wiki.apache.org/solr/UsingMailingLists Especially the bit about adding debugQuery=on and showing the results. You're asking people to guess at solutions

Re: Facets involving multiple fields

2012-04-13 Thread Erick Erickson
Nope. Information about your higher level use-case would probably be a good thing, this is starting to smell like an XY problem. Best Erick On Fri, Apr 13, 2012 at 5:48 AM, Marc SCHNEIDER marc.schneide...@gmail.com wrote: Hi, Thanks for your answer. Yes it works in this case when I know the

Solr data export to CSV File

2012-04-13 Thread Pavnesh
Hi Team, A very-very thanks to you guy who had developed such a nice product. I have one query regarding solr that I have app 36 Million data in my solr and I wants to export all the data to a csv file but I have found nothing on the same so please help me on this topic . Regards

Re: How to read SOLR cache statistics?

2012-04-13 Thread Erick Erickson
Well, the place to start is here: *stats*: lookups : 98 *hits *: 59 *hitratio *: 0.60 *inserts *: 41 *evictions *: 0 *size *: 41 the important bits are hitratio and evictions. Caches only really start to show their stuff when the hit ratio is quite high. That's the percentage of requests that

Re: performance impact using string or float when querying ranges

2012-04-13 Thread Erick Erickson
Well, I guess my first question is whether using stirngs is fast enough, in which case there's little reason to make your life more complex. But yes, range queries will be significantly faster with any of the Trie types than with strings. Trie types are all numeric types. Best Erick On Fri,

Re: Issues with language based indexing

2012-04-13 Thread Erick Erickson
Please review: http://wiki.apache.org/solr/UsingMailingLists there's so little information to go on here that I really can't say anything that isn't a guess. At a minimum we need the raw input, the fieldType definitions from your schema, the results of adding debugQuery=on to your URL Best

Re: Solr data export to CSV File

2012-04-13 Thread Erick Erickson
Does this help? http://wiki.apache.org/solr/CSVResponseWriter Best Erick On Fri, Apr 13, 2012 at 7:59 AM, Pavnesh pavnesh.ku...@altruistindia.com wrote: Hi Team, A very-very thanks to you guy who had developed such a nice product. I have one query regarding solr that I have app 36

RE: Realtime /get versus SearchHandler

2012-04-13 Thread Darren Govoni
Yes brbrbr--- Original Message --- On 4/13/2012 06:25 AM Benson Margulies wrote:brA discussion over on the dev list led me to expect that the by-if brfield retrievals in a SolrCloud query would come through the get brhandler. In fact, I've seen them turn up in my search component in

RE: Solr data export to CSV File

2012-04-13 Thread Ben McCarthy
A combination of the CSV response writer and SOLRJ to page through all of the results sending it to something like apache commons fileutils: FileUtils.writeStringToFile(new File(output.csv), outputLine (line.separator), true); Would be quiet quick to knock up in Java. Thanks

Re: searching across multiple fields using edismax - am i setting this up right?

2012-04-13 Thread geeky2
thank you for the response. it seems to be working well ;) 1) i tried your suggestion about removing the qt parameter - *somecore/partItemNoSearch*q=dishwasherdebugQuery=onrows=10 but this results in a 404 error message - is there some configuration i am missing to support this short-hand

Re: searching across multiple fields using edismax - am i setting this up right?

2012-04-13 Thread Erick Erickson
as to 1) you have to define your request handler with a leading /, as in name= /partItemNoSearch. Don't forget to restart your server. 3) Of course. The input terms MUST be run through the associated analysis chain to have any hope of matching correctly. Best Erick On Fri, Apr 13, 2012 at 8:36

Errors during indexing

2012-04-13 Thread Ben McCarthy
Hello We have just switched to Solr4 as we needed the ability to return geodist() along with our results. I use a simple multithreaded java app and solr to ingest the data. We keep seeing the following: 13-Apr-2012 15:50:10 org.apache.solr.common.SolrException log SEVERE:

RE: solr 3.5 taking long to index

2012-04-13 Thread Rohit
Hi Shawn, Thanks for the information, let me give this a try, since this is a live box I will try it during the weekend and update you. Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: 13

Solr is not extracting the CDATA part of xml

2012-04-13 Thread srini
I am trying to use method that is suggested in solr forum to remove CDATA part of xml. but it is not working. result show whole xml content instead of CDATA part. schema.xml fieldType name=text_ws2 class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer

Re: Solr is not extracting the CDATA part of xml

2012-04-13 Thread srini
not sure why CDATA part did not get interpreted. this is how xml content looks like. I added quotes just to present the exact content xml content. body/body -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html

Re: performance impact using string or float when querying ranges

2012-04-13 Thread Yonik Seeley
On Fri, Apr 13, 2012 at 8:11 AM, Erick Erickson erickerick...@gmail.com wrote: Well, I guess my first question is whether using stirngs is fast enough, in which case there's little reason to make your life more complex. But yes, range queries will be significantly faster with any of the Trie

mergePolicy element format change in 3.6 vs 3.5?

2012-04-13 Thread Peter Wolanin
Trying to maintain the Drupal integration module across multiple versions of 3.x, we've gotten a bug report suggesting that Solr 3.6 needs this change to solrconfig: - mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy/mergePolicy +mergePolicy

RE: mergePolicy element format change in 3.6 vs 3.5?

2012-04-13 Thread Michael Ryan
It looks like the first format was removed in 3.6 as part of https://issues.apache.org/jira/browse/SOLR-1052. The second format works in all 3.x versions. -Michael -Original Message- From: Peter Wolanin [mailto:peter.wola...@acquia.com] Sent: Friday, April 13, 2012 12:32 PM To:

Re: mergePolicy element format change in 3.6 vs 3.5?

2012-04-13 Thread Peter Wolanin
Ok, thanks for the info. As long as the second one works, we can just use that. I just verified that it works for 3.5 at least. -Peter On Fri, Apr 13, 2012 at 1:12 PM, Michael Ryan mr...@moreover.com wrote: It looks like the first format was removed in 3.6 as part of

Re: Solr is not extracting the CDATA part of xml

2012-04-13 Thread Erick Erickson
Solr does not index arbitrary XML content. There is and XML form of a solr document that can be sent to Solr, but it is a specific form of XML. An example of the XML you're trying to index and what you mean by not working would be helpful. Best Erick On Fri, Apr 13, 2012 at 11:50 AM, srini

Re: Solr is not extracting the CDATA part of xml

2012-04-13 Thread srini
Erick, Thanks for your reply. when you say Solr does not index arbitery xml document, then below is the way my xml document looks like which is sitting in oracle. Could you suggest the best of indexing it ? which method should I follow? Should I use XPathEntityProcessor? ?xml version=1.0

Re: Solr is not extracting the CDATA part of xml

2012-04-13 Thread Erick Erickson
Right, that will not work at all for direct transmission to Solr. You could write a Java program that parses this and sends it to Solr via SolrJ. Personally I haven't connected a database to Solr with XPathEntityProcessor in the mix, but I believe I've seen messages go by with this

Re: Solr is not extracting the CDATA part of xml

2012-04-13 Thread Alexander Aristov
Hi This is not solr format. You must re-format your XML into solr XML. you may find examples on solr wiki or in solr examples dir. Best Regards Alexander Aristov On 13 April 2012 23:13, srini softtec...@gmail.com wrote: Erick, Thanks for your reply. when you say Solr does not index

Re: Solr is not extracting the CDATA part of xml

2012-04-13 Thread srini
Thanks Again for quick reply. Little curious about the procedure you suggested. I thought of using same procedure as you suggested. Like writing a java program to fetch xml record from db and parse the content hand it to Solr for indexing. but what if my database content get changed? should I re

Re: Boosting StandardQuery scores with a subquery?

2012-04-13 Thread Chris Hostetter
: I'm having some trouble wrapping my head around boosting StandardQueries. : It looks like the function: query(subquery, default) : http://wiki.apache.org/solr/FunctionQuery#query is what I want, but the : examples seem to focus on just returning a score (e.g. product of popularity : and the

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-13 Thread Jan Høydahl
Hi, For a web crawl+search like this you will probably need a lot of additional Big Data crunching, so a Hadoop based solution is wise. In addition to those products mentioned we also now have Amazon's own CloudSearch http://aws.amazon.com/cloudsearch/ It's new, is not as cool as Solr (not

Re: Post Sorting hook before the doc slicing.

2012-04-13 Thread Chris Hostetter
: Basically, I need to find item X in the result set and return say N items : before and N items after. : : - N items -- Item X --- N items ... : So I might be wrong, but it looks like the only way would be to create a : custom SolrIndexSearcher which will find the offset and

Re: Can I discover what part of a score is attributable to a subquery?

2012-04-13 Thread John Chee
On Fri, Apr 13, 2012 at 2:40 PM, Benson Margulies bimargul...@gmail.com wrote: Given a query including a subquery, is there any way for me to learn that subquery's contribution to the overall document score? I can provide 'why on earth would anyone ...' if someone wants to know. Have you

Re: two structures in solr

2012-04-13 Thread Chris Hostetter
: I need to store *two big structures* in SOLR: projects and contractors. : Contractors will search for available projects and project owners will : search for contractors who would do it for them. http://wiki.apache.org/solr/MultipleIndexes : that *I want to have two structures*. I guess

Re: term frequency outweighs exact phrase match

2012-04-13 Thread alxsss
Hello Hoss, Here are the explain tags for two doc str name=a0127d8e70a6d523 0.021646015 = (MATCH) sum of: 0.021646015 = (MATCH) sum of: 0.02141003 = (MATCH) max plus 0.01 times others of: 2.84194E-4 = (MATCH) weight(content:apache^0.5 in 3578), product of: 0.0029881175 =

Re: Can I discover what part of a score is attributable to a subquery?

2012-04-13 Thread Benson Margulies
On Fri, Apr 13, 2012 at 6:43 PM, John Chee johnc...@mylife.com wrote: On Fri, Apr 13, 2012 at 2:40 PM, Benson Margulies bimargul...@gmail.com wrote: Given a query including a subquery, is there any way for me to learn that subquery's contribution to the overall document score? I need this

Re: Can I discover what part of a score is attributable to a subquery?

2012-04-13 Thread Chris Hostetter
: Given a query including a subquery, is there any way for me to learn : that subquery's contribution to the overall document score? You have to just execute the subquery itself ... doc collection and score calculation doesn't keep track the subscores. you could do this using functions in the

Re: Solr is not extracting the CDATA part of xml

2012-04-13 Thread Lance Norskog
This all comes from a database? Here is what you want. The DataImportHandler includes a toolkit for doing full and incremental loading from databases. Read this first: http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/DIHQuickStart Then these:

Re: Can I discover what part of a score is attributable to a subquery?

2012-04-13 Thread Benson Margulies
On Fri, Apr 13, 2012 at 7:07 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Given a query including a subquery, is there any way for me to learn : that subquery's contribution to the overall document score? You have to just execute the subquery itself ... doc collection and score

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-13 Thread Ali S Kureishy
Thanks Otis. I really appreciate the details offered here. This was very helpful information. I'm going to go through Solandra and Elastic Search and see if those make sense. I was also given a suggestion to use SolrCloud on FuseDFS (that's two recommendations for SolrCloud so far), so I will

dynamic analyzer based on condition

2012-04-13 Thread srinir
Hi, I want to pick different analyzers for the same field for different languages. I can determine the language from a different field. I would have different fieldTypes defined in my schema.xml such as text_en, text_de, text_fr, etc where i specify which analyzer and filter to use during

remoteLink that change it's text

2012-04-13 Thread Marcelo Carvalho Fernandes
Hi! I have the following gsp code... g:each in=${productInstanceList} status=i var=productInstance !-- display product properties ommited -- g:remoteLink action=addaction id=${i} update=[success:'what-to-put-here',failure:'error']

Re: remoteLink that change it's text

2012-04-13 Thread Marcelo Carvalho Fernandes
Sorry! Wrong list! Marcelo Carvalho Fernandes +55 21 8272-7970 +55 21 2205-2786 On Fri, Apr 13, 2012 at 10:54 PM, Marcelo Carvalho Fernandes mcf2...@gmail.com wrote: Hi! I have the following gsp code... g:each in=${productInstanceList} status=i var=productInstance !-- display