Re: Using XSLT with DIH for a URLDataSource

2010-02-22 Thread Shalin Shekhar Mangar
On Mon, Feb 22, 2010 at 1:18 PM, Roland Villemoes r...@alpha-solutions.dkwrote: Hi, I have to load data for Solr from a UrlDataSource supplying me with a XML feed. In the simple case where I just do simple XSLT select this works just fine. Just as shown on the wiki

SV: Using XSLT with DIH for a URLDataSource

2010-02-22 Thread Roland Villemoes
Hi (thanks a lot) Yes, The full stacktrace is this: 22-02-2010 08:37:00 org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Error initializing XSL Processing Document # 1 at

regarding queryboost

2010-02-22 Thread Smith G
Hello , I have explored ranking formula. As far as I understand, it seems query-boost value is used only in queryNorm (to be exact : in sumOfSquaredWeights ) that too inversely. queryNorm(q) = queryNorm(sumOfSquaredWeights)= 1 –– sumOfSquaredWeights½

multicore setup and security

2010-02-22 Thread Jorg Heymans
Hi, What is the recommended pattern for securing a multicore solr instance, accessed by different applications ? In our case, we need to prevent application A from accessing the core of application B. Also, we need to avoid the use of username/password authentication wherever possible. I have

Re: Why ASCIIFoldingFilter is not a CharFilter

2010-02-22 Thread Shalin Shekhar Mangar
I wasn't suggesting that they should be changed but trying to understand why. This makes sense. Thanks Erik and Robert. On Mon, Feb 22, 2010 at 6:16 AM, Robert Muir rcm...@gmail.com wrote: right, most stemmers expect the diacritics to be in their input to work correctly, too. On Sun, Feb 21,

Re: Using XSLT with DIH for a URLDataSource

2010-02-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
The xslt file looks fine . is the location of the file correct ? On Mon, Feb 22, 2010 at 2:57 PM, Roland Villemoes r...@alpha-solutions.dk wrote: Hi (thanks a lot) Yes, The full stacktrace is this: 22-02-2010 08:37:00 org.apache.solr.handler.dataimport.DataImporter doFullImport

Re: Why ASCIIFoldingFilter is not a CharFilter

2010-02-22 Thread Robert Muir
Shalin, yeah. i guess in my opinion, the diacritics handling in conjunction with a stemmer is unfortunately not very easy to do, without getting wierd results. for example, the snowball stemmers usually expect these diacritics to be there, they are looking for something closer to the proper

how to patch solr

2010-02-22 Thread Ranveer Kumar
Hi All, I have no idea about How to run patch? I am using windows os and I need to patch https://issues.apache.org/jira/secure/attachment/12407047/SOLR-1139.patch Currently I download solr 1.4 and using. Should I need to download source code and compiled. or can patch jar (compiled) file

Re: how to patch solr

2010-02-22 Thread Paul Libbrecht
Ranveer, there's many ways on Windows as well. An easy way is provided in IntelliJ IDEA's support for subversion. There must be a similar one by Eclipse. And of course, there's the gang of command-line tools provided by Cygwin. paul On 22-févr.-10, at 15:15, Ranveer Kumar wrote: Hi All,

SV: Using XSLT with DIH for a URLDataSource

2010-02-22 Thread Roland Villemoes
You're right! I was as simple (stupid!) as that, Thanks a lot (for your time .. very appreciated) Roland -Oprindelig meddelelse- Fra: noble.p...@gmail.com [mailto:noble.p...@gmail.com] På vegne af Noble Paul ??? ?? Sendt: 22. februar 2010 14:01 Til:

Re: how to patch solr

2010-02-22 Thread Erick Erickson
You need to apply the patch to the source then compile the source. Applying the patch can be done in any modern IDE, but it may take some poking. In Eclipse, bring up the context menu on the project, teamapply patch. Or you can use the patch command directly (install svn first), see

Re: Need feedback on solr security

2010-02-22 Thread Jan Høydahl / Cominvent
Hi, Does open for public mean end users through browser or web sites through API? In either case you should have a front end proxying the traffic through to Solr, which explicitly allows only parameters that you allow. -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 17.

Highlighting inside a field with HTML contents

2010-02-22 Thread Xavier Schepler
Hello, this field would not be searched, but it would be used to display results. A query could be : q=tablehl=truehl.fl=htmlfieldhl.fragsize=0 It would be tokenized with the HTMLStripStandardTokenizerFactory, then analyzed the same way as the searcheable fields. Could this result in

Understanding delta import

2010-02-22 Thread adeelmahmood
hi there I am having some trouble understanding delta import and how its different from full import .. from what I can tell the only difference is that it has the clean parameter set to false by default .. otherwise as far as setting up your query to use the data_import_last_index_time .. you can

Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-22 Thread Jay Hill
Looks like multi-threaded support was added to the DIH recently: http://issues.apache.org/jira/browse/SOLR-1352 -Jay On Fri, Feb 19, 2010 at 6:27 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Glen may be referring to LuSql indexing with multiple threads? Does/can DIH do that, too?

Re: score computation for dismax handler

2010-02-22 Thread Jay Hill
Set the tie parameter to 1.0. This param is set between 0.0 (pure disjunction maximum) and 1.0 (pure disjunction sum): http://wiki.apache.org/solr/DisMaxRequestHandler#tie_.28Tie_breaker.29 -Jay On Thu, Feb 18, 2010 at 4:24 AM, bharath venkatesh bharathv6.proj...@gmail.com wrote: Hi ,

Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-22 Thread Mark Miller
and a ramBufferSize of 3GB If you had actually used great than 2GB of it, you would have seen problems as an int overflowed - which is why its now hard limited - if (mb 2048.0) { throw new IllegalArgumentException(ramBufferSize + mb + is too large; should be comfortably less than

Re: optimize is taking too much time

2010-02-22 Thread Jay Hill
With a mergeFactor set to anything 1 you would never have only one segment - unless you optimized. So Lucene will never naturally merge all the segments into one. Unless, I suppose, the mergeFactor was set to 1, but I've never tested that. It's hard to picture how that would work. If I

Performance issue in indexing the data with DIH when using subqueries

2010-02-22 Thread JavaGuy84
Hi, I am facing a performace issue when I am trying to index the data using DIH.. I have a model as below Tables Object ObjectProperty ObjectRelationship Object -- ObjectProperty one to Many Relationship Object -- ObjectRelationship one to Many Relationship We need to get

Re: optimize is taking too much time

2010-02-22 Thread David Smiley @MITRE.org
Your response contradicts the wiki's description of mergeFactor: http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor -- which clearly states that the indexes are merged into a single segment. It makes no reference to optimize to trigger this condition. If what you say is true, and we

Re: optimize is taking too much time

2010-02-22 Thread Yonik Seeley
On Sun, Feb 21, 2010 at 2:20 PM, David Smiley @MITRE.org dsmi...@mitre.org wrote: I've always thought that these two events were effectively equivalent.  -- the results of an optimize vs the results of Lucene _naturally_ merging all segments together into one. Correct. Occasionally one hit's

Spell check returns strange suggestion

2010-02-22 Thread darniz
Hello All Please reply to this ASAP I am using indexbasedSpellchecker right now i copy only model, and make names and some other fields to my spellcheck field. Hence my spell check field consists of only 120 words. The issue is if i type hond i get back honda which is fine. But when i type term

Spell check returns strange suggestion

2010-02-22 Thread darniz
Hello All Please reply to this ASAP I am using indexbasedSpellchecker right now i copy only model, and make names and some other fields to my spellcheck field. Hence my spell check field consists of only 120 words. The issue is if i type hond i get back honda which is fine. But when i type term

Re: optimize is taking too much time

2010-02-22 Thread Mark Miller
Also, a mergefactor of 1 is actually invalid - 2 is the lowest you can go. -- - Mark http://www.lucidimagination.com

Re: Spell check returns strange suggestion

2010-02-22 Thread Markus Jelsma
darniz said: Hello All Please reply to this ASAP I am using indexbasedSpellchecker right now i copy only model, and make names and some other fields to my spellcheck field. Hence my spell check field consists of only 120 words. The issue is if i type hond i get back honda which is fine.

Re: some scores to 0 using omitNorns=false

2010-02-22 Thread Lance Norskog
http://wiki.apache.org/lucene-java/ConceptsAndDefinitions On Thu, Feb 18, 2010 at 7:13 AM, Raimon Bosch raimon.bo...@gmail.com wrote: I am not an expert in lucene scoring formula, but omintNorms=false makes the scoring formula a little bit more complex, taking into account boosting for

Re: Faceting

2010-02-22 Thread Lance Norskog
There are several component libraries for UIMA on the net: http://incubator.apache.org/uima/external-resources.html 2010/2/18 José Moreira matrixowns...@gmail.com: have you used UIMA? i did a quick read on the docs and it seems to do what i'm looking for. 2010/2/11 Otis Gospodnetic

Re: Deleting spelll checker index

2010-02-22 Thread Lance Norskog
More precisely, remnant terms from deleted documents slowly disappear as you add new documents or when you optimize the index. On Thu, Feb 18, 2010 at 11:09 AM, darniz rnizamud...@edmunds.com wrote: Thanks If this is really the case, i declared a new filed called mySpellTextDup and retired

Re: parsing strings into phrase queries

2010-02-22 Thread Lance Norskog
Thanks Robert, that helped. On Thu, Feb 18, 2010 at 5:48 AM, Robert Muir rcm...@gmail.com wrote: i gave it a rough shot Lance, if there's a better way to explain it, please edit On Wed, Feb 17, 2010 at 10:23 PM, Lance Norskog goks...@gmail.com wrote: That would be great. After reading this

Re: Spell check returns strange suggestion

2010-02-22 Thread darniz
Thanks for the prompt reply i added the parameter str name=accuracy0.7/str to my config and this seems to take care of it. Works which are present very close to the misspelled words seems to come back now. darniz Markus Jelsma - Buyways B.V. wrote: darniz said: Hello All Please reply to

Re: Solr 1.5 in production

2010-02-22 Thread Grant Ingersoll
On Feb 20, 2010, at 8:53 AM, Asif Rahman wrote: One piece of functionality that I need is the ability to index a spatial shape. I've begun implementing this for solr 1.4 using just the spatial capabilities in lucene with a custom update processor and query parser. At this point I'm only

Re: optimize is taking too much time

2010-02-22 Thread Jay Hill
Thanks for clearing that up guys, I misspoke slightly. It's just that, in a running system, it's probably very rare that there is only a single segment for any meaningful length of time. Unless that merge-down-to-one occurs right when indexing stops there will almost always be a new (small)

Odd wildcard behavior

2010-02-22 Thread cjkadakia
I'm getting very odd behavior from a wildcard search. For example, when I'm searching for docs with a name containing the word International the following occur: q=name:(inte*) -- found International q=name:(intern*) -- found International q=name:(interna*) -- did not find International

Re: Odd wildcard behavior

2010-02-22 Thread Erick Erickson
Several things: You're including a stemmer in your field, that'll transform your indexed terms. Have you used the schema browser in the admin page to take a look at the results of indexing with stemming? Luke is also good for this. What shows up when you add debugQuery=true to your search?

Re: How does one sort facet queries?

2010-02-22 Thread Chris Hostetter
: All sorting of facets works great at the field level (count/index)...all good : there...but how is sorting accomplished with range queries? The solrj : response doesn't seem to maintain the order the queries are sent in, and the The facet_queries section of the facet_counts is in the order that

Re: optimize is taking too much time

2010-02-22 Thread Yonik Seeley
On Mon, Feb 22, 2010 at 6:39 PM, Jay Hill jayallenh...@gmail.com wrote: It's just that, in a running system, it's probably very rare that there is only a single segment for any meaningful length of time. Right - but the performance impact of a huge merge can be non-trivial. People wishing to

Re: Odd wildcard behavior

2010-02-22 Thread Robert Muir
porter stemmer turns 'international' into 'intern' On Mon, Feb 22, 2010 at 6:57 PM, cjkadakia cjkada...@sonicbids.com wrote: I'm getting very odd behavior from a wildcard search. For example, when I'm searching for docs with a name containing the word International the following occur:

Re: Odd wildcard behavior

2010-02-22 Thread cjkadakia
If stemming is the underlying issue here, then are there any suggestions? Would I have to remove the SnowballPorterFilterFactory from both the index AND the query? Just to clarify, the ability to search on foos and return foo (and vice-versa) is quite important, but this other issue with

uniqueKey: required or not required

2010-02-22 Thread P Franks
All, Various books and online documentation states that the uniqueKey is not required and we do not have a need for a unique key so i tried remove it from the schema or setting it with no value but solr will not start with it defined. I do have an integer value that will be unique to each

Re: Understanding delta import

2010-02-22 Thread adeelmahmood
any ideas ??? adeelmahmood wrote: hi there I am having some trouble understanding delta import and how its different from full import .. from what I can tell the only difference is that it has the clean parameter set to false by default .. otherwise as far as setting up your query to use

Re: Solr 1.5 in production

2010-02-22 Thread Asif Rahman
We're modeling hyperlocal news articles. Each article is indexed with a shape that corresponds to the region of the map that is covered by the source of the article. We considered modeling the locality of the articles as points, but that approach would have limited our search options to bounding