Re: Sorting and searching on a field

2011-12-15 Thread pravesh
I have read about the option of copying this to a different field, using one for searching by tokenizing, and one for sorting. That would be the optimal way of doing it. Since sorting requires the fields not to be analyzed/tokenized, while the searching requires it. The copy field would be the

Re: Large RDBMS dataset

2011-12-15 Thread Finotti Simone
Thank you (and all the others who spent time answering me) very much for your insights! I didn't know how I've managed to miss CachedSqlEntityProcessor, but it seems that's just what I need. bye Inizio: Gora Mohanty [g...@mimirtech.com] Inviato:

Re: XPath with ExtractingRequestHandler

2011-12-15 Thread Péter Király
Hi, maybe I am wrong, but the // should be at the beggining of the expression, like //xhtml:div[@class='bibliographicData']/descendant:node(), or if you want to search the div inside body, you have to use descendant like

Re: Solr Search Across Multiple Cores not working when quering on specific field

2011-12-15 Thread ravicv
Hi I was able to do it by changing datatype of all field to textgen from textTight. I am not sure whats wrong with textTight datatype. Also can you please suggest me the best way to index huge database data. Currently I tried with dataimporthandler and CVS import . But both are giving almost

Specifing BatchSize parameter in db-data-config.xml will improve performance?

2011-12-15 Thread ravicv
Hi I am using Oracle Exadata as my DB. I want to index nearly 4 crore rows. I have tried with specifing batchsize as 1. and with out specifing batchsize. But both tests takes nearly same time. Could anyone suggest me best way to index huge data Quickly? -- View this message in context:

Re: XPath with ExtractingRequestHandler

2011-12-15 Thread Michael Kelleher
Yeah, I tried: //xhtml:div[@class='bibliographicData']/descendant:node() also tried //xhtml:div[@class='bibliographicData'] Neither worked. The DIV I need also had an ID value, and I tried both variations on ID as well. Still nothing. XPath handling for Tika seems to be pretty basic and

Re: Delta Replication in SOLR

2011-12-15 Thread Bob Stewart
Replication only copies new segment files so unless you are optimizing on commit it will not copy entire index. Make sure you do not optimize your index. Optimizing merges to a single segment and is not necessary. When new docs are added new small segment files are created so typical

Re: Solr Join with Dismax

2011-12-15 Thread Pascal Dimassimo
Thanks Hoss! Here it is: https://issues.apache.org/jira/browse/SOLR-2972 On Wed, Dec 14, 2011 at 4:47 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I have been doing more tracing in the code. And I think that I understand a : bit more. The problem does not seem to be dismax+join, but

Re: Faceting with null dates

2011-12-15 Thread Erick Erickson
Hmmm, I'm not sure I'm following this. Is there a way to query the index to not give me non-null dates in return So you want null dates? and: which gives me some unwanted non-null dates in the result set which seems to indicate you do NOT want null dates. I honestly don't know what your desired

Re: Using LocalParams in StatsComponent to create a price slider?

2011-12-15 Thread Erick Erickson
I really don't understand what you're asking, could you clarify with an example or two? Best Erick On Wed, Dec 14, 2011 at 10:36 AM, Mark Schoy hei...@gmx.de wrote: Hi, I'm using the StatsComponent to receive to lower and upper bounds of a price field to create a price slider. If someone

Re: Solr Search Across Multiple Cores not working when quering on specific field

2011-12-15 Thread Erick Erickson
I suspect that the distributed searching is working just fine in both cases, but your querying isn't doing what you expect due to differences in the analysis chain. I'd recommend spending some time with the admin/analysis page to see what is actually being parsed. And be aware that wildcards from

Announcement of Soldash - a dashboard for multiple Solr instances

2011-12-15 Thread Alexander Valet | edelight
We use Solr quite a bit at edelight -- and love it. However, we encountered one minor peeve: although each individual Solr server has its own dashboard, there's no easy way of getting a complete overview of an entire Solr cluster and the status of its nodes. Over the last weeks our own Aengus

how to setup to archive expired documents?

2011-12-15 Thread Robert Stewart
We have a large (100M) index where we add about 1M new docs per day. We want to keep index at a constant size so the oldest ones are removed and/or archived each day (so index contains around 100 days of data). What is the best way to do this? We still want to keep older data in some archive

Re: Migrate Lucene 2.9 To SOLR

2011-12-15 Thread Anderson vasconcelos
OK. Thanks for help. I gonna try do migrate 2011/12/14 Chris Hostetter hossman_luc...@fucit.org : I have a old project that use Lucene 2.9. Its possible to use the index : created by lucene in SOLR? May i just copy de index to data directory of : SOLR, or exists some mechanism to import

Re: NumericRangeQuery: what am I doing wrong?

2011-12-15 Thread Jay Luker
On Wed, Dec 14, 2011 at 5:02 PM, Chris Hostetter hossman_luc...@fucit.org wrote: I'm a little lost in this thread ... if you are programaticly construction a NumericRangeQuery object to execute in the JVM against a Solr index, that suggests you are writting some sort of SOlr plugin (or

Re: Large RDBMS dataset

2011-12-15 Thread Mikhail Khludnev
CachedSqlEntityProcessor joins you tables fine. But be aware that it works in the single thread only. On Thu, Dec 15, 2011 at 12:14 PM, Finotti Simone tech...@yoox.com wrote: CachedSqlEntityProcessor -- Sincerely yours Mikhail Khludnev Developer Grid Dynamics tel. 1-415-738-8644 Skype:

RE: how to setup to archive expired documents?

2011-12-15 Thread Avni, Itamar
What about managing a core for each day? This way the deletion/archive is very simple. No holes in the index (which is often when deleting document by document). The index done against core [today-0]. The query is done against cores [today-0],[today-1]...[today-99]. Quite a headache. Itamar

Re: how to setup to archive expired documents?

2011-12-15 Thread Robert Stewart
I think managing 100 cores will be too much headache. Also performance of querying 100 cores will not be good (need page_number*page_size from 100 cores, and then merge). I think having around 10 SOLR instances, each one about 10M docs. Always search all 10 nodes. Index using some hash(doc) to

Core overhead

2011-12-15 Thread Yury Kats
Does anybody have an idea, or better yet, measured data, to see what the overhead of a core is, both in memory and speed? For example, what would be the difference between having 1 core with 100M documents versus having 10 cores with 10M documents?

Re: Core overhead

2011-12-15 Thread Robert Stewart
I dont have any measured data, but here are my thoughts. I think overall memory usage would be close to the same. Speed will be slower in general, because if search speed is approx log(n) then 10 * log(n/10) log(n), and also if merging results you have overhead in the merge step and also if

Re: Trim and copy a solr field

2011-12-15 Thread Juan Grande
Hi Swapna, Do you want to modify the *indexed* value or the *stored* value? The analyzers modify the indexed value. To modify the stored value, the only option that I'm aware of is to write an UpdateProcessor that changes the document before it's indexed. *Juan* On Tue, Dec 13, 2011 at 2:05

Re: Core overhead

2011-12-15 Thread Yury Kats
On 12/15/2011 1:07 PM, Robert Stewart wrote: I think overall memory usage would be close to the same. Is this really so? I suspect that the consumed memory is in direct proportion to the number of terms in the index. I also suspect that if I divided 1 core with N terms into 10 smaller cores,

RE: Core overhead

2011-12-15 Thread Robert Petersen
I am running eight cores, each core serves up different types of searches so there is no overlap in their function. Some cores have millions of documents. My search times are quite fast. I don't see any real slowdown from multiple cores, but you just have to have enough memory for them. Memory

Re: Core overhead

2011-12-15 Thread Robert Stewart
It is true number of terms may be much more than N/10 (or even N for each core), but it is the number of docs per term that will really matter. So you can have N terms in each core but each term has 1/10 number of docs on avg. 2011/12/15 Yury Kats yuryk...@yahoo.com: On 12/15/2011 1:07 PM,

Re: Core overhead

2011-12-15 Thread Robert Stewart
One other thing I did not mention is GC pauses. If you have smaller heap sizes, you would have less very long GC pauses, so that can be an advantage having many cores (if cores are distributed into seperate SOLR instances, as seperate processes). I think you can expect 1 second pause for each GB

Call RequestHandler from QueryComponent

2011-12-15 Thread Maria Vazquez
Hi! I have a solrconfig.xml like: requestHandler name=/ABC class=solr.SearchHandler lst name=defaults str name=echoParamsall/str int name=start0/int int name=rows10/int str name=wtABC/str str name=sortscore desc,rating asc/str str name=fqCUSTOM FQ/str

Re: Core overhead

2011-12-15 Thread Yury Kats
On 12/15/2011 1:41 PM, Robert Petersen wrote: loading. Try it out, but make sure that the functionality you are actually looking for isn't sharding instead of multiple cores... Yes, but the way to achieve sharding is to have multiple cores. The question is then becomes -- how many cores

Is there an issue with hypens in SpellChecker with StandardTokenizer?

2011-12-15 Thread Brandon Fish
I am getting an error using the SpellChecker component with the query another-test java.lang.StringIndexOutOfBoundsException: String index out of range: -7 This appears to be related to this issuehttps://issues.apache.org/jira/browse/SOLR-1630 which has been marked as fixed. My configuration and

Call RequestHandler from QueryComponent

2011-12-15 Thread Maria Vazquez
Hi! I have a solrconfig.xml like: requestHandler name=/ABC class=solr.SearchHandler lst name=defaults str name=echoParamsall/str int name=start0/int int name=rows10/int str name=wtABC/str str name=sortscore desc,rating asc/str str name=fqCUSTOM FQ/str

Re: Faceting with null dates

2011-12-15 Thread Chris Hostetter
First of all, we need to clarify some terminology here: there is no such thing as a null date in solr -- or for that matter, there is no such thing as a full value in any field. documents either have some value(s) for a field, or they do not hvae any values. If you want to constrain your

edismax doesn't obey 'pf' parameter

2011-12-15 Thread entdeveloper
If I switch back and forth between defType=dismax and defType=edismax, the edismax doesn't seem to obey my pf parameter. I dug through the code a little bit and in the ExtendedDismaxQParserPlugin (Solr 3.4/Solr3.5), the part that is supposed to add the phrase comes here: Query phrase =

Re: edismax doesn't obey 'pf' parameter

2011-12-15 Thread Chris Hostetter
: If I switch back and forth between defType=dismax and defType=edismax, the : edismax doesn't seem to obey my pf parameter. I dug through the code a I just tried a sample query using Solr 3.5 with the example configs+data. This is the query i tried...

Re: Using LocalParams in StatsComponent to create a price slider?

2011-12-15 Thread Chris Hostetter
: I really don't understand what you're asking, could you clarify with : an example or two? I *believe* the question is about wanting to exlcude the effects of some fq params from the set of documents used to compute stats -- similar to how you can exclude tagged filters when generating facet

RE: Core overhead

2011-12-15 Thread Robert Petersen
Sure that is possible, but doesn't that defeat the purpose of sharding? Why distribute across one machine? Just keep all in one index in that case is my thought there... -Original Message- From: Yury Kats [mailto:yuryk...@yahoo.com] Sent: Thursday, December 15, 2011 11:47 AM To:

Poor performance on distributed search

2011-12-15 Thread ku3ia
Hi, all! I have a problem with distributed search. I downloaded one shard from my production. It has: * ~29M docs * 11 fields * ~105M terms * size of shard is: 13GB On production there are near 30 the same shards. I split this shard to 4 more smaller shards, so now I have: small shard1: docs:

RE: Is there an issue with hypens in SpellChecker with StandardTokenizer?

2011-12-15 Thread Steven A Rowe
Hi Brandon, When I add the following to SpellingQueryConverterTest.java on the tip of branch_3x (will be released as Solr 3.6), the test succeeds: @Test public void testStandardAnalyzerWithHyphen() { SpellingQueryConverter converter = new SpellingQueryConverter(); converter.init(new

Re: Core overhead

2011-12-15 Thread Yury Kats
On 12/15/2011 4:46 PM, Robert Petersen wrote: Sure that is possible, but doesn't that defeat the purpose of sharding? Why distribute across one machine? Just keep all in one index in that case is my thought there... To be able to scale w/o re-indexing. Also often referred to as

Re: Is there an issue with hypens in SpellChecker with StandardTokenizer?

2011-12-15 Thread Brandon Fish
Hi Steve, I was using branch 3.5. I will try this on tip of branch_3x too. Thanks. On Thu, Dec 15, 2011 at 4:14 PM, Steven A Rowe sar...@syr.edu wrote: Hi Brandon, When I add the following to SpellingQueryConverterTest.java on the tip of branch_3x (will be released as Solr 3.6), the test

Solr AutoComplete - Address Search

2011-12-15 Thread Vijay Sampath
Hi, I'm trying to implement autocomplete functionality for Address search. I've used the KeywordTokenizerFactory LowerCaseFilterFactory. Problem is, when I start typing the numbers at start, I got any results from SOLR (Eg: 3500 W South). Could you please guide on this fieldType

Re: edismax doesn't obey 'pf' parameter

2011-12-15 Thread entdeveloper
I'm observing strange results with both the correct and incorrect behavior happening depending on which field I put in the 'pf' param. I wouldn't think this should be analyzer specific, but is it? If I try:

Call RequestHandler from QueryComponent

2011-12-15 Thread Vazquez, Maria (STM)
Hi! I have a solrconfig.xml like: requestHandler name=/ABC class=solr.SearchHandler lst name=defaults str name=echoParamsall/str int name=start0/int int name=rows10/int str name=wtABC/str str name=sortscore desc,rating asc/str str name=fqCUSTOM FQ/str

Re: Is there an issue with hypens in SpellChecker with StandardTokenizer?

2011-12-15 Thread Brandon Fish
Yes the branch_3x works for me as well. The addition of the OffsetAttribute probably corrected this issue. I will either switch to WhitespaceAnalyzer, patch my distribution or wait for 3.6 to resolve this. Thanks. On Thu, Dec 15, 2011 at 4:17 PM, Brandon Fish brandon.j.f...@gmail.comwrote: Hi

SearchComponents and ShardResponse

2011-12-15 Thread Ken Krugler
Hi all, I feel like I must be missing something here... I'm working on a customized version of the SearchHandler, which supports distributed searching in multiple *local* cores. Assuming you want to support SearchComponents, then my handler needs to create/maintain a ResponseBuilder, which is

RE: Core overhead

2011-12-15 Thread Robert Petersen
I see there is a lot of discussions about micro-sharding, I'll have to read them. I'm on an older version of solr and just use master index replicating out to a farm of slaves. It always seemed like sharding causes a lot of background traffic to me when I read about it, but I never tried it out.

RE: Is there an issue with hypens in SpellChecker with StandardTokenizer?

2011-12-15 Thread Steven A Rowe
Brandon, Looks like SOLR-2509 https://issues.apache.org/jira/browse/SOLR-2509 fixed the problem - that's where OffsetAttribute was added (as you noted). I ran my test method on branches/lucene_solr_3_5/, and I got the same failure there as you did, so I can confirm that Solr 3.5 has this bug,

Re: Core overhead

2011-12-15 Thread Ted Dunning
Here is a talk I did on this topic at HPTS a few years ago. On Thu, Dec 15, 2011 at 4:28 PM, Robert Petersen rober...@buy.com wrote: I see there is a lot of discussions about micro-sharding, I'll have to read them. I'm on an older version of solr and just use master index replicating out to

Replication file become very very big

2011-12-15 Thread ZiLi
Hi all, I meet a very strange problem . We use a windows server as master serviced for 5 windows slaves and 3 Linux slaves . It has worked normally for 2 months .But today we find one of the Linux slave's index file become very very big (150G! Others is 300M ). And we can't find

Re: cache monitoring tools?

2011-12-15 Thread Justin Caratzas
Dmitry, Thats beyond the scope of this thread, but Munin essentially runs plugins which are essentially scripts that output graph configuration and values when polled by the Munin server. So it uses a plain text protocol, so that the scripts can be written in any language. Munin then feeds this

Re: Solr Version Upgrade issue

2011-12-15 Thread Pawan Darira
Thanks. I re-started from scratch at least things have started working now. I upgraded by deploying 3.2 war in my jboss. Also, did conf changes as mentioned in CHANGES.txt It did expected to have a separate libdirectory which was not required in 1.4. New problem is that it's taking very long to

RE: Trim and copy a solr field

2011-12-15 Thread Swapna Vuppala
Hi Juan, I think UpdateProcessor is what I would be needing. Can you please tell me more about it, as to how it works and all ? Thanks and Regards, Swapna. -Original Message- From: Juan Grande [mailto:juan.gra...@gmail.com] Sent: Thursday, December 15, 2011 11:43 PM To:

Re: Solr sentiment analysis

2011-12-15 Thread maha
hai i am dng research in sentimental analysis.pls give your valuable suggestions.how to start my research -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-sentiment-analysis-tp3151415p3590933.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr sentiment analysis

2011-12-15 Thread Husain, Yavar
This is a generic Machine Learning question and is not related to Solr (for which this thread is). You can ask this question on Stackoverflow.com. However one of the approaches: Just go through the chapter in O'reilly Programming Collective Intelligence on Non Negative Matrix Factorization. That

Re: Solr sentiment analysis

2011-12-15 Thread maha
i am interested to work in sentimental analysis.help me -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-sentiment-analysis-tp3151415p3590952.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Highlighter highlighting terms which are not part of the search

2011-12-15 Thread Shyam Bhaskaran
Hi Erick, I tried looking into our analyzers and also adding each of the filters that we were using one by one and getting the documents indexed and during this testing it was found that when using the solr.SynonymFilterFactory on top of the latest Solr 4.0 trunk code there is issue with

Exception using SolrJ

2011-12-15 Thread Shawn Heisey
I am seeing exceptions from some code I have written using SolrJ.I have placed it into a pastebin: http://pastebin.com/XnB83Jay I am creating a MultiThreadedHttpConnectionManager object, which I use to create an HttpClient, and that is used by all my CommonsHttpSolrServer objects, of which