Re: Text classification with Solr

2009-01-27 Thread Hannes Carl Meyer
Instead of indexing documents about 'sports' and searching for hits based upon 'basketball', 'football' etc.. I simply want to index the taxonomy and classify documents into it. This is a an ancient AI/Data-Mining discipline.. but the standard methods of 'indexing' the taxonomy are/were primitive

multilanguage prototype

2009-01-27 Thread revathy arun
Hi, I have downloade solr1.3.0 . I need to index chinese content ,for this i have defined a new field in the schema as fieldType name=text_cjk class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.CJKTokenizerFactory/ /analyzer analyzer type=query

Re: multilanguage prototype

2009-01-27 Thread Shalin Shekhar Mangar
Did you commit after the updates? 2009/1/27 revathy arun revas...@gmail.com Hi, I have downloade solr1.3.0 . I need to index chinese content ,for this i have defined a new field in the schema as fieldType name=text_cjk class=solr.TextField positionIncrementGap=100 analyzer

Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi I have committed.The admin page does not show any docs pending or committed or any errors. Regards Sujatha On 1/27/09, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Did you commit after the updates? 2009/1/27 revathy arun revas...@gmail.com Hi, I have downloade solr1.3.0 .

Re: multilanguage prototype

2009-01-27 Thread revathy arun
this is the stats of my updatehandler but i still dont see any index created *stats: *commits : 7 autocommits : 0 optimizes : 2 docsPending : 0 adds : 0 deletesById : 0 deletesByQuery : 0 errors : 0 cumulative_adds : 0 cumulative_deletesById : 0 cumulative_deletesByQuery : 0 cumulative_errors : 0

Re: multilanguage prototype

2009-01-27 Thread Shalin Shekhar Mangar
Are you looking for it in the right place? It is very unlikely that a commit happens and index is not created. The index is usually created inside the data directory as configured in your solconfig.xml Can you search for *:* from the solr admin page and see if documents are returned? On Tue,

Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi Shalin, The admin page stats are as follows searcherName : searc...@1d4c3d5 main caching : true numDocs : 0 maxDoc : 0 *name: * /update *class: * org.apache.solr.handler.XmlUpdateRequestHandler *version: * $Revision: 690026 $ *description: * Add documents with XML * stats: *handlerStart :

Highlighting does not work?

2009-01-27 Thread Jarek Zgoda
Solr 1.3 I'm trying to get highlighting working, with no luck so far. Query with params q=cyrusfl=*,scoreqt=standardhl=truehl.fl=title +description finds 182 documents in my index. All of the top 10 hits contain the word cyrus, but the highlights list is empty. The fields title and

Re: Highlighting does not work?

2009-01-27 Thread Jarek Zgoda
I turned these fields to indexed + stored but the results are exactly the same, no matter if I search in these fields or elsewhere. Wiadomość napisana w dniu 2009-01-27, o godz. 13:09, przez Jarek Zgoda: Solr 1.3 I'm trying to get highlighting working, with no luck so far. Query with

Re: multilanguage prototype

2009-01-27 Thread Erik Hatcher
errors: 11 What were those? My hunch is your indexer had issues. What did Solr output into the console or log during indexing? Erik On Jan 27, 2009, at 6:56 AM, revathy arun wrote: Hi Shalin, The admin page stats are as follows searcherName : searc...@1d4c3d5 main caching :

Re: Highlighting does not work?

2009-01-27 Thread Jarek Zgoda
Finally found that the fields have to have an analyzer to be highlighted. Neat. Can I ask somebody to document these all requirements? Wiadomość napisana w dniu 2009-01-27, o godz. 13:49, przez Jarek Zgoda: I turned these fields to indexed + stored but the results are exactly the same, no

Re: Error in Integrating JBoss 4.2 and Solr-1.3.0:

2009-01-27 Thread maveen
I am also getting the same issue. Did any one found the solution for this... Please respond sbutalia wrote: I'm having the same issue.. have you had any progress with this? -- View this message in context:

Re: Connection mismanagement in Solrj?

2009-01-27 Thread Walter Underwood
Making requests in parallel, using the default connection manager, which is multi-threaded, and we are reusing a single CommonsHttpSolrServer for all requests. wunder On 1/26/09 10:59 PM, Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com wrote: are you making requests in parallel ? which

Re: fastest way to index/reindex

2009-01-27 Thread Ian Connor
When you query by *:*, what order does it use. Is there a chance they will come in a different order as you page through the results (and miss/dupicate some). Is it best to put the order explicitly by 'id' or is that implied already? On Mon, Jan 26, 2009 at 12:00 PM, Ian Connor

Re: solrj delete by Id problem

2009-01-27 Thread Parisa
I found how the issue is created .when solr warm up the new searcher with cacheLists , if the queryResultCache is enable the issue is created. notice:as I mentioned before I commit with waitflush=false and waitsearcher=false so it has problem in case the queryResultCache is on, but I don't

Re: solrj delete by Id problem

2009-01-27 Thread Shalin Shekhar Mangar
On Tue, Jan 27, 2009 at 8:51 PM, Parisa paris...@gmail.com wrote: I found how the issue is created .when solr warm up the new searcher with cacheLists , if the queryResultCache is enable the issue is created. notice:as I mentioned before I commit with waitflush=false and waitsearcher=false

Re: QParserPlugin

2009-01-27 Thread Karl Wettin
So it was me defining it in schema.xml rather than solrconfig.xml. 17:17 erikhatcher where are you defining the qparser plugin? 17:18 erikhatcher it's very odd... if it isn't picking them up but you reference them, it would certainly give an error 17:18 karlwettin as a first level child to

Re: Text classification with Solr

2009-01-27 Thread Neal Richter
On Tue, Jan 27, 2009 at 1:36 AM, Hannes Carl Meyer m...@hcmeyer.com wrote: Yeah, know it, the challenge on this method is the calculation of the score and parametrization of thresholds. Not as worried about score itself as the score thresholds for prediction in/out. Is it really neccessary to

query with stemming, prefix and fuzzy?

2009-01-27 Thread Gert Brinkmann
Hello, I am trying to get Solr to properly work. I have set up a Solr test server (using jetty as mentioned in the tutorial). Also I had to modify the schema.xml so that I have different fields for different languages (with their own stemmers) that occur in the content management system that I am

Re: Connection mismanagement in Solrj?

2009-01-27 Thread Yonik Seeley
That's interesting SolrJ doesn't touch HTTPClient params if one is provided in the constructor. I guess I'd try to sniff the headers first and see if any difference sticks out between the clients. I normally just use netcat and pretend to be the solr server. -Yonik On Tue, Jan 27, 2009 at

Re: Connection mismanagement in Solrj?

2009-01-27 Thread Ryan McKinley
if you use this constructor: public CommonsHttpSolrServer(URL baseURL, HttpClient client) then solrj never touches the HttpClient configuration. I normally reuse a single CommonsHttpSolrServer as well. On Jan 27, 2009, at 9:52 AM, Walter Underwood wrote: Making requests in parallel,

multiple indexes

2009-01-27 Thread Jae Joo
Hi, I would like to know how it can be implemented. Index1 has fields id,1,2,3 and index2 has fields id,5,6,7. The ID in both indexes are unique id. Can I use a kind of distributed search and/or multicore to search, sort, and facet through 2 indexes (index1 and index2)? Thanks, Jae joo

Re: Text classification with Solr

2009-01-27 Thread Karl Wettin
27 jan 2009 kl. 17.23 skrev Neal Richter: Is it really neccessary to use Solr for it? Things going much faster with Lucene low-level api and much faster if you're loading the classification corpus into the RAM. Good points. At the moment I'd rather have a daemon with a service API..

index size tripled during optimization

2009-01-27 Thread Qingdi
Hi, Starting about one week ago, our index size gets tripled during optimization. The current index statistics are: numDocs : 192702132 size: 76G And we do optimization for every 6M docs update. Since we keep getting new data, the index size increases every day. Before, the index size was

Re: Indexing documents in multiple languages

2009-01-27 Thread Erick Erickson
First, I'd search the mail archive for the topic of languages, it's been discussed often and there's a wealth of information that might be of benefit, far more information than I can remember. As to whether your approach will be too big, too slow..., you really haven't given enough information to

Optimizing Improving results based on user feedback

2009-01-27 Thread Matthew Runo
Hello folks! We've been thinking about ways to improve organic search results for a while (really, who hasn't?) and I'd like to get some ideas on ways to implement a feedback system that uses user behavior as input. Basically, it'd work on the premise that what the user actually clicked

Re: Text classification with Solr

2009-01-27 Thread Grant Ingersoll
I guess I've been called to the chalkboard... I haven't looked specifically at putting the taxonomy in Lucene/Solr, but it is an interesting idea. In reading the paper you mentioned, there are some interesting ideas there and Solr could obviously just as easily be used as Lucene, I think.

Re: Optimizing Improving results based on user feedback

2009-01-27 Thread Walter Underwood
I've been thinking about the same thing. We have a set of queries that defy straightforward linguistics and ranking, like figuring out how to match charlie brown to It's the Great Pumpkin, Charlie Brown in October and to A Charlie Brown Christmas in December. I don't have any solutions yet, but I

Tools for Managing Synonyms, Elevate, etc.

2009-01-27 Thread Cohen, Mark - IST
I'm considering building some tools for our internal non-technical staff to write to synonyms.txt, elevate.xml, spellings.txt, and protwords.txt so software developers don't have to maintain them. Before my team starts building these tools, has anyone done this before? If so, are these tools

Re: Highlighting does not work?

2009-01-27 Thread Mike Klaas
They are documented in http://wiki.apache.org/solr/ FieldOptionsByUseCase and in the FAQ , but I agree that it could be more readily accessible. -Mike On 27-Jan-09, at 5:26 AM, Jarek Zgoda wrote: Finally found that the fields have to have an analyzer to be highlighted. Neat. Can I ask

Re: Connection mismanagement in Solrj?

2009-01-27 Thread Jon Baer
Could it be the framework you are using around it? I know some IOC containers will auto pool objects underneath as a service without you really knowing it is being done or has to be explicitly turned off. Just a thought. I use a single server for all requests behind a Hivemind setup ... umm not

Re: Setting dataDir in multicore environment

2009-01-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
I shall give a patch today On Tue, Jan 27, 2009 at 11:58 PM, Mark Ferguson mark.a.fergu...@gmail.com wrote: Oh I see, thanks for the clarification. Unfortunately this brings me back to same problem I started with: implicit properties aren't available when managing indexes through the REST

Re: Connection mismanagement in Solrj?

2009-01-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
if you are making requests in parallel , then it is likely that you see many connections open at a time. They will get cleaned up over time . But if you wish to clean them up explicitly use httpclient.getHttpConnectionManager()r#closeIdleConnections() On Tue, Jan 27, 2009 at 8:22 PM, Walter

question about dismax and parentheses

2009-01-27 Thread surfer10
Hello, dear members. I'm a little bit confused about dismax syntax. as far as i know (and i might be wrong) it supports default query language such as +WORD -WORD What about parentheses ? my title of doc consist of WORD1 WORD2 WORD3. when i'm trying to search +WORD1 +(WORD2 WORD4) + WORD3 it

[dummy question] applying patch

2009-01-27 Thread surfer10
i'm a little bit noob in java compiler so could you please tell me what tools are used to apply patch SOLR-236 (Field groupping), does it need to be applied on current solr-1.3 (and nightly builds of 1.4) or it already in box? what batch file stands for solr compilation in its distributive? --

Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi, This is the only info in the tomcat log at indexing Jan 27, 2009 3:46:15 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/lang_prototype path=/update params={} status=0 QTime=191 I dont see any ohter errors in the logs . when i use curl to update i get success message. and commit

Store limited text

2009-01-27 Thread Gargate, Siddharth
Hi All, Is it possible to store only limited text in the field, say, max 1 mb? The field maxfieldlength limits only the number of tokens to be indexed, but stores complete content. Thanks, Siddharth

Re: [dummy question] applying patch

2009-01-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
since you are asking about 'batch file' , are you using windows? I recommend using TortoiseSVN to apply patch On Wed, Jan 28, 2009 at 10:05 AM, surfer10 mainp...@mail.ru wrote: i'm a little bit noob in java compiler so could you please tell me what tools are used to apply patch SOLR-236 (Field

Re: Setting dataDir in multicore environment

2009-01-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
There is a patch given for SOLR-883 . On Wed, Jan 28, 2009 at 9:43 AM, Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com wrote: I shall give a patch today On Tue, Jan 27, 2009 at 11:58 PM, Mark Ferguson mark.a.fergu...@gmail.com wrote: Oh I see, thanks for the clarification. Unfortunately

Re: question about dismax and parentheses

2009-01-27 Thread surfer10
i found Hoss's explanations at http://www.nabble.com/Dismax-and-Grouping-query-td12938168.html#a12938168 seems to be i cant do this. so my question is transforming to following: can i join multiple dismax queries into one? for instance if i'm looking for +WORD1 +(WORD2 WORD3) it can be

Re: Store limited text

2009-01-27 Thread Chris Harris
If you're using a Solr build post-r721758, then copyfield has a maxChars property you can take advantage of. I'm probably misremembering some of the exact names of these elements/attributes, but you can basically have this in your schema.xml: field name=f indexed=true stored=false / field

Re: Text classification with Solr

2009-01-27 Thread Neal Richter
On Tue, Jan 27, 2009 at 2:21 PM, Grant Ingersoll gsing...@apache.org wrote: One of the things I am interested in is the marriage of Solr and Mahout (which has some Genetic Algorithms support) and other ML (Weka, etc.) tools. [snip] I love it, good to know you are thinking big here. Here's

Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi, I a, getting this error in the tomcat log file on passing chinese test to the content field The content field uses the ckj tokenizer. and is defined as fieldType name=text_cjk class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.CJKTokenizerFactory/

Re: Optimizing Improving results based on user feedback

2009-01-27 Thread Neal Richter
OK I've implemented this before, written academic papers and patents related to this task. Here are some hints: - you're on the right track with the editorial boosting elevators - http://wiki.apache.org/solr/UserTagDesign - be darn careful about assuming that one click is enough evidence