RE: Customizing Solr to handle Leading Wildcard queries

2009-01-27 Thread Jana, Kumar Raja
Hi,

Thanks Otis, Newton and everyone else for the help on this issue.

Most of the data I index are documents like pdfs, word Docs, open office
documents, etc. I store the content of the document in a field called
content and the remaining metadata of the document like name, id,
created by, modified by, created on, etc in a copy field called
metadata. I am not particularly interested in enabling leading wildcard
characters in the content (although such a possibility would be a
bonus). For this, I've tried implementing the suggestion to store
reverse strings as well as the correct strings for the metadata field.
All leading wildcard queries like "*abc" and searched as "cba*" against
the reversed metadata field. So far so good. Thank you :)

But now, I ran into the scenario where the query string is *abc* :( and
the whole thing came down crashing again. I cannot ignore such queries.
I would rather take the risk of Solr OOMing by enabling the leading
wildcard query searches. 

Can someone please tell me the steps to turn on this feature in Lucene
QueryParser? I am sure it will be helpful to many to document such a
procedure on the Wiki or somewhere else. (I am definitely going to do
that once I fix this. Too much trouble this seems to be)
Also, which queryParser does Solr use by default? 

Thanks,
Kumar




-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Thursday, January 15, 2009 10:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Customizing Solr to handle Leading Wildcard queries

Hi ramuK,

I believe you can turn that "on" via the Lucene QueryParser, but of
course such searches will be slo(oo)w.  You can also index reversed
tokens (e.g. *kumar --> rakum*) or you could index n-grams with
begin/end delim characters (e.g. kumar -> ^ k u m a r $, *kumar -> "k u
m a r $")


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: "Jana, Kumar Raja" 
> To: solr-user@lucene.apache.org
> Sent: Thursday, January 15, 2009 9:49:24 AM
> Subject: RE: Customizing Solr to handle Leading Wildcard queries
> 
> Hi Erik,
> 
> Thanks for the quick reply.
> I want to enable leading wildcard query searches in general. The case
> mentioned in the earlier mail is just one of the many instances I use
> this feature.
> 
> -Kumar
> 
> 
> 
> 
> -Original Message-
> From: Erik Hatcher [mailto:e...@ehatchersolutions.com] 
> Sent: Thursday, January 15, 2009 7:59 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Customizing Solr to handle Leading Wildcard queries
> 
> 
> On Jan 15, 2009, at 8:23 AM, Jana, Kumar Raja wrote:
> > Not being able to perform Leading Wildcard queries is a major  
> > handicap.
> > I want to be able to perform searches like *.pdf to fetch all pdf
> > documents from Solr.
> 
> For this particular case, I recommend indexing the document type as a

> separate field.  Something like type:pdf (or use a MIME type string).

> Then you can do a very direct and fast query to search or facet by  
> document types.
> 
> Erik



Re: Optimizing & Improving results based on user feedback

2009-01-27 Thread Neal Richter
OK I've implemented this before, written academic papers and patents
related to this task.

Here are some hints:
   - you're on the right track with the editorial boosting elevators
   - http://wiki.apache.org/solr/UserTagDesign
   - be darn careful about assuming that one click is enough evidence
to boost a long
 'distance'
   - first page effects in search will skew the learning badly if you
don't compensate.
95% of users never go past the first page of results, 1% go
past the second
page.  So perfectly good results on the second page get
permanently locked out
   - consider forgetting what you learn under some condition

In fact this whole area is called 'learning to rank' and is a hot
research topic in IR.
http://web.mit.edu/shivani/www/Ranking-NIPS-05/
http://research.microsoft.com/en-us/um/people/lr4ir-2007/
https://research.microsoft.com/en-us/um/people/lr4ir-2008/

- Neal Richter


On Tue, Jan 27, 2009 at 2:06 PM, Matthew Runo  wrote:
> Hello folks!
>
> We've been thinking about ways to improve organic search results for a while
> (really, who hasn't?) and I'd like to get some ideas on ways to implement a
> feedback system that uses user behavior as input. Basically, it'd work on
> the premise that what the user actually clicked on is probably a really good
> match for their search, and should be boosted up in the results for that
> search.
>
> For example, if I search for "rain boots", and really love the 10th result
> down (and show it by clicking on it), then we'd like to capture this and use
> the data to boost up that result //for that search//. We've thought about
> using index time boosts for the documents, but that'd boost it regardless of
> the search terms, which isn't what we want. We've thought about using the
> Elevator handler, but we don't really want to force a product to the top -
> we'd prefer it slowly rises over time as more and more people click it from
> the same search terms. Another way might be to stuff the keyword into the
> document, the more times it's in the document the higher it'd score - but
> there's gotta be a better way than that.
>
> Obviously this can't be done 100% in solr - but if anyone had some clever
> ideas about how this might be possible it'd be interesting to hear them.
>
> Thanks for your time!
>
> Matthew Runo
> Software Engineer, Zappos.com
> mr...@zappos.com - 702-943-7833
>
>


Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi,

I a, getting this error in the tomcat log file on passing chinese test to
the content field
The content field uses the ckj tokenizer.
and is defined as




















INFO: [] webapp=/lang_prototype path=/update params={} status=0 QTime=69

Jan 28, 2009 12:17:03 PM org.apache.solr.common.SolrException log

SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
((CTRL-CHAR, code 1))

at [row,col {unknown-source}]: [2,76]

at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)

at
com.ctc.wstx.sr.BasicStreamReader.readTextPrimary(BasicStreamReader.java:4556)

at
com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2888)

at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)

at
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:321)

at
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:195)

at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)

at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)

at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)

at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)

at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874)

at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)

at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)

at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)

at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)

at java.lang.Thread.run(Thread.java:619)
regards

On 1/28/09, revathy arun  wrote:
>
> Hi,
>
>
> This is the only info in the tomcat log at indexing
>
> Jan 27, 2009 3:46:15 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/lang_prototype path=/update params={} status=0 QTime=191
> I dont see any ohter errors in the logs .
>
> when i use curl to update i get success message.
>
> and commit data in solr admin is showing positive ,where in the index file
> there are not indexes created.
>
> regards
> sujatha
>
>
>  On 1/27/09, Erik Hatcher  wrote:
>>
>> errors: 11
>>
>> What were those?
>>
>> My hunch is your indexer had issues.  What did Solr output into the
>> console or log during indexing?
>>
>>Erik
>>
>> On Jan 27, 2009, at 6:56 AM, revathy arun wrote:
>>
>> Hi Shalin,
>>>
>>> The admin page stats are as follows
>>> searcherName : searc...@1d4c3d5 main
>>> caching : true
>>> numDocs : 0
>>> maxDoc : 0
>>>
>>> *name: * /update  *class: *
>>> org.apache.solr.handler.XmlUpdateRequestHandler
>>> *version: * $Revision: 690026 $  *description: * Add documents with XML
>>>  *
>>> stats: *handlerStart : 1232692774389
>>> requests : 22
>>> errors : 11
>>> timeouts : 0
>>> totalTime : 1181
>>> avgTimePerRequest : 53.68182
>>> avgRequestsPerSecond : 6.0431463E-5
>>>
>>> *stats: *commits : 9
>>> autocommits : 0
>>> optimizes : 2
>>> docsPending : 0
>>> adds : 0
>>> deletesById : 0
>>> deletesByQuery : 0
>>> errors : 0
>>> cumulative_adds : 0
>>> cumulative_deletesById : 0
>>> cumulative_deletesByQuery : 0
>>> cumulative_errors : 0
>>>
>>> in the solrconfg.xml i have commented this line
>>>
>>>
>>> 
>>>
>>> so the index will be created in the default data folder under solr home,
>>>
>>>
>>>
>>> Thanks for ur time
>>>
>>> regards
>>>
>>> sujatha
>>> On 1/27/09, Shalin Shekhar Mangar  wrote:
>>>

 Are you looking for it in the right place? It is very unlikely that a
 commit
 happens and index is not created.

 The index is usually created inside the data directory as configured in
 your
 solconfig.xml

 Can you search for *:* from the solr admin page and see if documents are
 returned?

 On Tue, Jan 27, 2009 at 5:01 PM, revathy arun 
 wrote:

 this is the stats of my updatehandler
> but i still dont see any index created
> *stats: *commits : 7
> autocommits : 0
> optimizes : 2
> docsPending : 0
> adds : 0
>>

Re: Text classification with Solr

2009-01-27 Thread Neal Richter
On Tue, Jan 27, 2009 at 2:21 PM, Grant Ingersoll  wrote:
> One of the things I am interested in is the marriage of Solr and Mahout
> (which has some Genetic Algorithms support) and other ML (Weka, etc.) tools.
 [snip]

I love it, good to know you are thinking big here.  Here's another big thought:
http://www.eml-r.org/nlp/papers/ponzetto07b.pdf .. but assume we want
to extract this type of structure from the full text of Wikipedia
rather than the narrow categories DB.

> Things that can help with all this:  LukeReqHandler, TermVectorComponent,
> TermsComponent, others
>

[snip]

> Neal, what did you have in mind for a JIRA issue?  I'd love to see a patch.

More research needed, but the initial idea would be to enable the
passing in of a weighted term vector as a query and allowing a
more-like-this type search on it.  Anyone attempt this yet?

Interesting point about faceting here is that it would give outgoing
feedback on what  /new/ words (not in initial query) that if added to
the query would result in additional discrimination between the
matched categories.

So Solr outputs a set of categories for a document, and also emits a
set of related words to the initial query!  Categorization and
recommendation in one.

- Neal


Re: Store limited text

2009-01-27 Thread Chris Harris
If you're using a Solr build post-r721758, then copyfield has a
maxChars property you can take advantage of. I'm probably
misremembering some of the exact names of these elements/attributes,
but you can basically have this in your schema.xml:





Then anything you store in field f will get copied for storage into
f_for_retrieval -- but only up to 1M chars.

Here the truncation is done by the field copy. Not sure that there's a
way to do it right now without a field copy.

On Tue, Jan 27, 2009 at 8:45 PM, Gargate, Siddharth  wrote:
> Hi All,
>Is it possible to store only limited text in the field, say, max 1
> mb?  The field maxfieldlength limits only the number of tokens to be
> indexed, but stores complete content.
>
> Thanks,
> Siddharth
>


Re: Setting dataDir in multicore environment

2009-01-27 Thread Mark Ferguson
This is just what I needed, thank you so much for the quick response! It's
really appreciated!

Mark


On Tue, Jan 27, 2009 at 9:59 PM, Noble Paul നോബിള്‍ नोब्ळ् <
noble.p...@gmail.com> wrote:

> There is a patch given for SOLR-883 .
>
> On Wed, Jan 28, 2009 at 9:43 AM, Noble Paul നോബിള്‍  नोब्ळ्
>  wrote:
> > I shall give a patch today
> >
> > On Tue, Jan 27, 2009 at 11:58 PM, Mark Ferguson
> >  wrote:
> >> Oh I see, thanks for the clarification.
> >>
> >> Unfortunately this brings me back to same problem I started with:
> implicit
> >> properties aren't available when managing indexes through the REST api.
> I
> >> know there is a patch in the works for this issue but I can't wait for
> it.
> >> Is there any way to share the solrconfig.xml file and create indexes
> >> dynamically?
> >>
> >> Mark
> >>
> >>
> >> On Mon, Jan 26, 2009 at 9:02 PM, Noble Paul നോബിള്‍ नोब्ळ् <
> >> noble.p...@gmail.com> wrote:
> >>
> >>> The behavior is expected
> >>> properties set in solr.xml are not implicitly used anywhere.
> >>> you will have to use those variables explicitly in
> >>> solrconfig.xml/schema.xml
> >>> instead of hardcoding dataDir in solrconfig.xml you can use it as a
> >>> variable $$dataDir
> >>>
> >>> BTW there is an issue (https://issues.apache.org/jira/browse/SOLR-943)
> >>> which helps you specify the dataDir in solr.xml
> >>>
> >>>
> >>> On Tue, Jan 27, 2009 at 5:19 AM, Mark Ferguson
> >>>  wrote:
> >>> > Hi,
> >>> >
> >>> > In my solr.xml file, I am trying to set the dataDir property the way
> it
> >>> is
> >>> > described in the CoreAdmin page on the wiki:
> >>> >
> >>> > 
> >>> >  
> >>> > 
> >>> >
> >>> > However, the property is being completed ignored. It is using
> whatever I
> >>> > have set in the solrconfig.xml file (or ./data, the default value, if
> I
> >>> set
> >>> > nothing in that file). Any idea what I am doing wrong? I am trying
> this
> >>> > approach to avoid using ${solr.core.name} in the solrconfig.xml
> file,
> >>> since
> >>> > dynamic properties are broken for creating cores via the REST api.
> >>> >
> >>> > Mark
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> --Noble Paul
> >>>
> >>
> >
> >
> >
> > --
> > --Noble Paul
> >
>
>
>
> --
> --Noble Paul
>


Re: question about dismax and parentheses

2009-01-27 Thread surfer10

i found Hoss's explanations at
http://www.nabble.com/Dismax-and-Grouping-query-td12938168.html#a12938168

seems to be i cant do this. so my question is transforming to following:

can i join multiple dismax queries into one? for instance if i'm looking for
+WORD1 +(WORD2 WORD3)
it can be translated into +WORD1 +WORD2 and +WORD1 +WORD3 query

or can i joing standartRequestHandler queries to different fields into one?

-- 
View this message in context: 
http://www.nabble.com/question-about-dismax-and-parentheses-tp21699822p21700182.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Setting dataDir in multicore environment

2009-01-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
There is a patch given for SOLR-883 .

On Wed, Jan 28, 2009 at 9:43 AM, Noble Paul നോബിള്‍  नोब्ळ्
 wrote:
> I shall give a patch today
>
> On Tue, Jan 27, 2009 at 11:58 PM, Mark Ferguson
>  wrote:
>> Oh I see, thanks for the clarification.
>>
>> Unfortunately this brings me back to same problem I started with: implicit
>> properties aren't available when managing indexes through the REST api. I
>> know there is a patch in the works for this issue but I can't wait for it.
>> Is there any way to share the solrconfig.xml file and create indexes
>> dynamically?
>>
>> Mark
>>
>>
>> On Mon, Jan 26, 2009 at 9:02 PM, Noble Paul നോബിള്‍ नोब्ळ् <
>> noble.p...@gmail.com> wrote:
>>
>>> The behavior is expected
>>> properties set in solr.xml are not implicitly used anywhere.
>>> you will have to use those variables explicitly in
>>> solrconfig.xml/schema.xml
>>> instead of hardcoding dataDir in solrconfig.xml you can use it as a
>>> variable $$dataDir
>>>
>>> BTW there is an issue (https://issues.apache.org/jira/browse/SOLR-943)
>>> which helps you specify the dataDir in solr.xml
>>>
>>>
>>> On Tue, Jan 27, 2009 at 5:19 AM, Mark Ferguson
>>>  wrote:
>>> > Hi,
>>> >
>>> > In my solr.xml file, I am trying to set the dataDir property the way it
>>> is
>>> > described in the CoreAdmin page on the wiki:
>>> >
>>> > 
>>> >  
>>> > 
>>> >
>>> > However, the property is being completed ignored. It is using whatever I
>>> > have set in the solrconfig.xml file (or ./data, the default value, if I
>>> set
>>> > nothing in that file). Any idea what I am doing wrong? I am trying this
>>> > approach to avoid using ${solr.core.name} in the solrconfig.xml file,
>>> since
>>> > dynamic properties are broken for creating cores via the REST api.
>>> >
>>> > Mark
>>> >
>>>
>>>
>>>
>>> --
>>> --Noble Paul
>>>
>>
>
>
>
> --
> --Noble Paul
>



-- 
--Noble Paul


Re: [dummy question] applying patch

2009-01-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
since you are asking about 'batch file' , are you using windows?
I recommend using TortoiseSVN to apply patch

On Wed, Jan 28, 2009 at 10:05 AM, surfer10  wrote:
>
> i'm a little bit noob in java compiler so could you please tell me what tools
> are used to apply patch SOLR-236 (Field groupping), does it need to be
> applied on current solr-1.3 (and nightly builds of 1.4) or it already in
> box?
>
> what batch file stands for solr compilation in its distributive?
> --
> View this message in context: 
> http://www.nabble.com/-dummy-question--applying-patch-tp21699846p21699846.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul


Store limited text

2009-01-27 Thread Gargate, Siddharth
Hi All,
Is it possible to store only limited text in the field, say, max 1
mb?  The field maxfieldlength limits only the number of tokens to be
indexed, but stores complete content.
 
Thanks,
Siddharth


Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi,


This is the only info in the tomcat log at indexing

Jan 27, 2009 3:46:15 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/lang_prototype path=/update params={} status=0 QTime=191
I dont see any ohter errors in the logs .

when i use curl to update i get success message.

and commit data in solr admin is showing positive ,where in the index file
there are not indexes created.

regards
sujatha


On 1/27/09, Erik Hatcher  wrote:
>
> errors: 11
>
> What were those?
>
> My hunch is your indexer had issues.  What did Solr output into the console
> or log during indexing?
>
>Erik
>
> On Jan 27, 2009, at 6:56 AM, revathy arun wrote:
>
> Hi Shalin,
>>
>> The admin page stats are as follows
>> searcherName : searc...@1d4c3d5 main
>> caching : true
>> numDocs : 0
>> maxDoc : 0
>>
>> *name: * /update  *class: *
>> org.apache.solr.handler.XmlUpdateRequestHandler
>> *version: * $Revision: 690026 $  *description: * Add documents with XML  *
>> stats: *handlerStart : 1232692774389
>> requests : 22
>> errors : 11
>> timeouts : 0
>> totalTime : 1181
>> avgTimePerRequest : 53.68182
>> avgRequestsPerSecond : 6.0431463E-5
>>
>> *stats: *commits : 9
>> autocommits : 0
>> optimizes : 2
>> docsPending : 0
>> adds : 0
>> deletesById : 0
>> deletesByQuery : 0
>> errors : 0
>> cumulative_adds : 0
>> cumulative_deletesById : 0
>> cumulative_deletesByQuery : 0
>> cumulative_errors : 0
>>
>> in the solrconfg.xml i have commented this line
>>
>>
>> 
>>
>> so the index will be created in the default data folder under solr home,
>>
>>
>>
>> Thanks for ur time
>>
>> regards
>>
>> sujatha
>> On 1/27/09, Shalin Shekhar Mangar  wrote:
>>
>>>
>>> Are you looking for it in the right place? It is very unlikely that a
>>> commit
>>> happens and index is not created.
>>>
>>> The index is usually created inside the data directory as configured in
>>> your
>>> solconfig.xml
>>>
>>> Can you search for *:* from the solr admin page and see if documents are
>>> returned?
>>>
>>> On Tue, Jan 27, 2009 at 5:01 PM, revathy arun 
>>> wrote:
>>>
>>> this is the stats of my updatehandler
 but i still dont see any index created
 *stats: *commits : 7
 autocommits : 0
 optimizes : 2
 docsPending : 0
 adds : 0
 deletesById : 0
 deletesByQuery : 0
 errors : 0
 cumulative_adds : 0
 cumulative_deletesById : 0
 cumulative_deletesByQuery : 0
 cumulative_errors : 0

 regards

 On 1/27/09, revathy arun  wrote:

>
> Hi
>
> I have committed.The admin page does not show any docs pending or
>
 committed

> or any errors.
>
> Regards
> Sujatha
>
>
> On 1/27/09, Shalin Shekhar Mangar  wrote:
>
>>
>> Did you commit after the updates?
>>
>> 2009/1/27 revathy arun 
>>
>> Hi,
>>>
>>> I have downloade solr1.3.0 .
>>>
>>> I need to index chinese content ,for this i have defined a new field
>>>
>> in

> the
>>
>>> schema
>>>
>>> as
>>>
>>>
>>> >> positionIncrementGap="100">
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>>
>>>
>>> I beleive solr1.3 already has the cjkanalyzer by default.
>>>
>>> my schema in the testing stage has only 2 fields
>>>
>>> >>
>> required="true"
>>
>>> />
>>>
>>> >>
>> />
>>>

>>>
>>>
>>> However when i index the chinese text into content , no index is
>>>
>> being
>>>
  created.i dont see any errors in tomcat as well .
>>>
>>> this is only entry in tomcat on updating
>>>
>>> Jan 27, 2009 3:46:15 PM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/lang_prototype path=/update params={} status=0
>>>
>> QTime=191
>>
>>>
>>> I have attached the chinese text file for reference.
>>>
>>>
>>>
>>> Regards
>>>
>>> sujatha
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>
>

>>>
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>>
>


[dummy question] applying patch

2009-01-27 Thread surfer10

i'm a little bit noob in java compiler so could you please tell me what tools
are used to apply patch SOLR-236 (Field groupping), does it need to be
applied on current solr-1.3 (and nightly builds of 1.4) or it already in
box?

what batch file stands for solr compilation in its distributive?
-- 
View this message in context: 
http://www.nabble.com/-dummy-question--applying-patch-tp21699846p21699846.html
Sent from the Solr - User mailing list archive at Nabble.com.



question about dismax and parentheses

2009-01-27 Thread surfer10

Hello, dear members.
I'm a little bit confused about dismax syntax. as far as i know (and i might
be wrong) it supports default query language such as +WORD -WORD

What about parentheses ?

my title of doc consist of WORD1 WORD2 WORD3. when i'm trying to search
+WORD1 +(WORD2 WORD4) + WORD3 it does not match

how can i query for that?

also  could you please tell me ho can i search such construction as a
phrase?
-- 
View this message in context: 
http://www.nabble.com/question-about-dismax-and-parentheses-tp21699822p21699822.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Connection mismanagement in Solrj?

2009-01-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
if you are making requests in parallel , then it is likely that you
see many connections open at a time. They will get cleaned up over
time . But if you wish to clean them up explicitly use
httpclient.getHttpConnectionManager()r#closeIdleConnections()

On Tue, Jan 27, 2009 at 8:22 PM, Walter Underwood
 wrote:
> Making requests in parallel, using the default connection manager,
> which is multi-threaded, and we are reusing a single CommonsHttpSolrServer
> for all requests.
>
> wunder
>
> On 1/26/09 10:59 PM, "Noble Paul നോബിള്‍  नोब्ळ्" 
> wrote:
>
>> are you making requests in parallel ?
>> which ConnectionManager are you using for HttpClient?
>>
>> On Tue, Jan 27, 2009 at 11:58 AM, Noble Paul നോബിള്‍  नोब्ळ्
>>  wrote:
>>> you can set any connection parameters for the HttpClient and pass on
>>> the instance to CommonsHttpSolrServer and that will be used for making
>>> requests
>>>
>>> make sure that you are not reusing instance of CommonsHttpSolrServer
>>>
>>> On Tue, Jan 27, 2009 at 10:59 AM, Walter Underwood
>>>  wrote:
 We just switched to Solrj from a home-grown client and we have a huge
 jump in the number of connections to the server, enough that our
 load balancer was rejecting connections in production tonight.

 Does that sound familiar? We're running 1.3.

 I set the timeouts and connection pools to the same values I'd
 used in my other code, also based on HTTPClient.

 We can roll back to my code temporarily, but we want some of
 the Solrj facet support for a new project.

 wunder


>>>
>>>
>>>
>>> --
>>> --Noble Paul
>>>
>>
>>
>
>



-- 
--Noble Paul


Re: Setting dataDir in multicore environment

2009-01-27 Thread Noble Paul നോബിള്‍ नोब्ळ्
I shall give a patch today

On Tue, Jan 27, 2009 at 11:58 PM, Mark Ferguson
 wrote:
> Oh I see, thanks for the clarification.
>
> Unfortunately this brings me back to same problem I started with: implicit
> properties aren't available when managing indexes through the REST api. I
> know there is a patch in the works for this issue but I can't wait for it.
> Is there any way to share the solrconfig.xml file and create indexes
> dynamically?
>
> Mark
>
>
> On Mon, Jan 26, 2009 at 9:02 PM, Noble Paul നോബിള്‍ नोब्ळ् <
> noble.p...@gmail.com> wrote:
>
>> The behavior is expected
>> properties set in solr.xml are not implicitly used anywhere.
>> you will have to use those variables explicitly in
>> solrconfig.xml/schema.xml
>> instead of hardcoding dataDir in solrconfig.xml you can use it as a
>> variable $$dataDir
>>
>> BTW there is an issue (https://issues.apache.org/jira/browse/SOLR-943)
>> which helps you specify the dataDir in solr.xml
>>
>>
>> On Tue, Jan 27, 2009 at 5:19 AM, Mark Ferguson
>>  wrote:
>> > Hi,
>> >
>> > In my solr.xml file, I am trying to set the dataDir property the way it
>> is
>> > described in the CoreAdmin page on the wiki:
>> >
>> > 
>> >  
>> > 
>> >
>> > However, the property is being completed ignored. It is using whatever I
>> > have set in the solrconfig.xml file (or ./data, the default value, if I
>> set
>> > nothing in that file). Any idea what I am doing wrong? I am trying this
>> > approach to avoid using ${solr.core.name} in the solrconfig.xml file,
>> since
>> > dynamic properties are broken for creating cores via the REST api.
>> >
>> > Mark
>> >
>>
>>
>>
>> --
>> --Noble Paul
>>
>



-- 
--Noble Paul


Re: Connection mismanagement in Solrj?

2009-01-27 Thread Jon Baer
Could it be the framework you are using around it?  I know some IOC
containers will auto pool objects underneath as a service without you really
knowing it is being done or has to be explicitly turned off.  Just a
thought.  I use a single server for all requests behind a Hivemind setup ...
umm not by choice :-\

- Jon

On Tue, Jan 27, 2009 at 12:32 PM, Ryan McKinley  wrote:

> if you use this constructor:
>
>  public CommonsHttpSolrServer(URL baseURL, HttpClient client)
>
> then solrj never touches the HttpClient configuration.
>
> I normally reuse a single CommonsHttpSolrServer as well.
>
>
>
> On Jan 27, 2009, at 9:52 AM, Walter Underwood wrote:
>
>  Making requests in parallel, using the default connection manager,
>> which is multi-threaded, and we are reusing a single CommonsHttpSolrServer
>> for all requests.
>>
>> wunder
>>
>> On 1/26/09 10:59 PM, "Noble Paul നോബിള്‍  नोब्ळ्" 
>> wrote:
>>
>>  are you making requests in parallel ?
>>> which ConnectionManager are you using for HttpClient?
>>>
>>> On Tue, Jan 27, 2009 at 11:58 AM, Noble Paul നോബിള്‍  नोब्ळ्
>>>  wrote:
>>>
 you can set any connection parameters for the HttpClient and pass on
 the instance to CommonsHttpSolrServer and that will be used for making
 requests

 make sure that you are not reusing instance of CommonsHttpSolrServer

 On Tue, Jan 27, 2009 at 10:59 AM, Walter Underwood
  wrote:

> We just switched to Solrj from a home-grown client and we have a huge
> jump in the number of connections to the server, enough that our
> load balancer was rejecting connections in production tonight.
>
> Does that sound familiar? We're running 1.3.
>
> I set the timeouts and connection pools to the same values I'd
> used in my other code, also based on HTTPClient.
>
> We can roll back to my code temporarily, but we want some of
> the Solrj facet support for a new project.
>
> wunder
>
>
>


 --
 --Noble Paul


>>>
>>>
>>
>


Re: Highlighting does not work?

2009-01-27 Thread Mike Klaas
They are documented in http://wiki.apache.org/solr/ 
FieldOptionsByUseCase and in the FAQ , but I agree that it could be  
more readily accessible.


-Mike

On 27-Jan-09, at 5:26 AM, Jarek Zgoda wrote:

Finally found that the fields have to have an analyzer to be  
highlighted. Neat.


Can I ask somebody to document these all requirements?

Wiadomość napisana w dniu 2009-01-27, o godz. 13:49, przez Jarek  
Zgoda:


I turned these fields to indexed + stored but the results are  
exactly the same, no matter if I search in these fields or elsewhere.


Wiadomość napisana w dniu 2009-01-27, o godz. 13:09, przez Jarek  
Zgoda:



Solr 1.3

I'm trying to get highlighting working, with no luck so far.

Query with params  
q=cyrus&fl=*,score&qt=standard&hl=true&hl.fl=title+description  
finds 182 documents in my index. All of the top 10 hits contain  
the word "cyrus", but the highlights list is empty. The fields  
"title" and "description" are stored but not indexed. If I specify  
"*" as hl.fl value I get the same results.


Do I need to add some special configuration to enable highlighting  
feature?


--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
jarek.zg...@redefine.pl



--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
jarek.zg...@redefine.pl



--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
jarek.zg...@redefine.pl





Tools for Managing Synonyms, Elevate, etc.

2009-01-27 Thread
I'm considering building some tools for our internal non-technical staff
to write to synonyms.txt, elevate.xml, spellings.txt, and protwords.txt
so software developers don't have to maintain them.  Before my team
starts building these tools, has anyone done this before?  If so, are
these tools available as open source?  

Thanks,
Mark Cohen


Re: Optimizing & Improving results based on user feedback

2009-01-27 Thread Walter Underwood
I've been thinking about the same thing. We have a set of queries
that defy straightforward linguistics and ranking, like figuring
out how to match "charlie brown" to "It's the Great Pumpkin,
Charlie Brown" in October and to "A Charlie Brown Christmas"
in December.

I don't have any solutions yet, but I recommend analyzing click logs
and looking at queries where the most-clicked item is not #1.

wunder

On 1/27/09 1:06 PM, "Matthew Runo"  wrote:

> Hello folks!
> 
> We've been thinking about ways to improve organic search results for a
> while (really, who hasn't?) and I'd like to get some ideas on ways to
> implement a feedback system that uses user behavior as input.
> Basically, it'd work on the premise that what the user actually
> clicked on is probably a really good match for their search, and
> should be boosted up in the results for that search.
> 
> For example, if I search for "rain boots", and really love the 10th
> result down (and show it by clicking on it), then we'd like to capture
> this and use the data to boost up that result //for that search//.
> We've thought about using index time boosts for the documents, but
> that'd boost it regardless of the search terms, which isn't what we
> want. We've thought about using the Elevator handler, but we don't
> really want to force a product to the top - we'd prefer it slowly
> rises over time as more and more people click it from the same search
> terms. Another way might be to stuff the keyword into the document,
> the more times it's in the document the higher it'd score - but
> there's gotta be a better way than that.
> 
> Obviously this can't be done 100% in solr - but if anyone had some
> clever ideas about how this might be possible it'd be interesting to
> hear them.
> 
> Thanks for your time!
> 
> Matthew Runo
> Software Engineer, Zappos.com
> mr...@zappos.com - 702-943-7833




Re: Text classification with Solr

2009-01-27 Thread Grant Ingersoll

I guess I've been called to the chalkboard...

I haven't looked specifically at putting the taxonomy in Lucene/Solr,  
but it is an interesting idea.  In reading the paper you mentioned,  
there are some interesting ideas there and Solr could obviously just  
as easily be used as Lucene, I think.


One of the things I am interested in is the marriage of Solr and  
Mahout (which has some Genetic Algorithms support) and other ML (Weka,  
etc.) tools.  So, for instance in the paper, they have multiple  
indexes, one for negative and positive sets, well that could be done  
with Solr cores or just through intelligent filtering.  Then, you  
could have Mahout work do it's training/clustering/whatever in the  
background as needed just by sending a ReqHandler commands and output  
it's model that can be shared on the "output" side so that you can  
nicely serve up your results as part of search results or even  
standalone, so either as a SearchComponent or from the ReqHandler.  Of  
course, the tricky part is in the implementation and managing the  
memory, threading, etc.


Things that can help with all this:  LukeReqHandler,  
TermVectorComponent, TermsComponent, others


As for Hannes question about "Why Solr" I think you can still get  
close to the metal w/ Solr just as Lucene, but now you have the built  
in framework that makes experimentation so much easier, IMO, plus you  
have all the features that Solr has to offer.  For instance, a  
reasonable thing to do with the output from the classification is, of  
course, to facet on them.


Neal, what did you have in mind for a JIRA issue?  I'd love to see a  
patch.



On Jan 26, 2009, at 12:29 PM, Neal Richter wrote:


Hey all,

 I'm in the processing of implementing a system to do 'text
classification' with Solr.  The basic idea is to take an
ontology/taxonomy like dmoz of {label: "X", tags: "a,b,c,d,e"}, index
it and then classify documents into the taxonomy by pushing parsed
document into the Solr search API.  Why?  Lucene/Solr's ability to do
weighted term boosting at both search and index time has lots of
obvious uses here.

Has anyone worked on this or a similar project yet?  I've seen some
talk on the list about this area but it's pretty thin... December
thread "Taxonomy Support on Solr".  I'm assuming Grant Ingersoll is
looking at similar things with his 'taming text' project.

I store the 'documents' in another repository and they are far too
dynamic (write intensive) for direct indexing in Solr... so the
previously suggested procedure of 1) store document 2) execute
more-like-this and 3) delete document would be too slow.

If people are interested I could start a JIRA issue on this (I do not
see anything there at the moment).

Thanks - Neal Richter
http://aicoder.blogspot.com


--
Grant Ingersoll
http://www.lucidimagination.com/

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ













Optimizing & Improving results based on user feedback

2009-01-27 Thread Matthew Runo

Hello folks!

We've been thinking about ways to improve organic search results for a  
while (really, who hasn't?) and I'd like to get some ideas on ways to  
implement a feedback system that uses user behavior as input.  
Basically, it'd work on the premise that what the user actually  
clicked on is probably a really good match for their search, and  
should be boosted up in the results for that search.


For example, if I search for "rain boots", and really love the 10th  
result down (and show it by clicking on it), then we'd like to capture  
this and use the data to boost up that result //for that search//.  
We've thought about using index time boosts for the documents, but  
that'd boost it regardless of the search terms, which isn't what we  
want. We've thought about using the Elevator handler, but we don't  
really want to force a product to the top - we'd prefer it slowly  
rises over time as more and more people click it from the same search  
terms. Another way might be to stuff the keyword into the document,  
the more times it's in the document the higher it'd score - but  
there's gotta be a better way than that.


Obviously this can't be done 100% in solr - but if anyone had some  
clever ideas about how this might be possible it'd be interesting to  
hear them.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833



Re: Indexing documents in multiple languages

2009-01-27 Thread Erick Erickson
First, I'd search the mail archive for the topic of languages, it's
been discussed often and there's a wealth of information that
might be of benefit, far more information than I can remember.

As to whether your approach will be "too big, too slow...", you
really haven't given enough information to go on. Here are a few
of the questions the answers to which would help: How many
e-mails are you indexing? Are you indexing attachments? How
many users to you expect to be using this system? What
are your target response times? What is your design
queries-per-second? How much dynamic is the index (that is,
how many e-mails do you expect to add per day and what is
the latency you can live with between the time the e-mail is
indexed and when it's searchable)?

If you're indexing 10,000 e-mails, it's one thing. If you're indexing
1,000,000,000 e-mails it's another.

Best
Erick

On Tue, Jan 27, 2009 at 3:05 PM, Alejandro Valdez <
alejandro.val...@gmail.com> wrote:

> Hi, I plan to use solr to index a large number of documents extracted
> from emails bodies, such documents could be in different languages,
> and a single  document could be in more than one language. In the same
> way, the query string could be words in different languages.
>
> I read that a common approach to index multilingual documents is to
> use some algorithm (n-gram) to determine the document language, then use a
> stemmer and finally index the document in a different index for each
> language.
>
> As the document language and the query string can't be detected in a
> reliable way, I think that it make not sense to use a stemmer on them
> because a stemmer is tied to a specific language.
>
> My plan is to index all the documents in the same index, without any
> stemming process (the users will have to search for the exact words that
> they are looking for).
>
> But I'm not sure if this approach will make the index too big, too
> slow, or if there is a better way to index this kind of documents.
>
> Any suggestion will be very appreciated.
>


Indexing documents in multiple languages

2009-01-27 Thread Alejandro Valdez
Hi, I plan to use solr to index a large number of documents extracted
from emails bodies, such documents could be in different languages,
and a single  document could be in more than one language. In the same
way, the query string could be words in different languages.

I read that a common approach to index multilingual documents is to
use some algorithm (n-gram) to determine the document language, then use a
stemmer and finally index the document in a different index for each
language.

As the document language and the query string can't be detected in a
reliable way, I think that it make not sense to use a stemmer on them
because a stemmer is tied to a specific language.

My plan is to index all the documents in the same index, without any
stemming process (the users will have to search for the exact words that
they are looking for).

But I'm not sure if this approach will make the index too big, too
slow, or if there is a better way to index this kind of documents.

Any suggestion will be very appreciated.


index size tripled during optimization

2009-01-27 Thread Qingdi


Hi,

Starting about one week ago, our index size gets tripled during
optimization.

The current index statistics are:
numDocs : 192702132 
size: 76G
And we do optimization for every 6M docs update. 

Since we keep getting new data, the index size increases every day. Before,
the index size was only doubled during optimization. 

Why the index size gets tripled instead of doubled during optimization? Is
there anything we can do to keep the index only doubled during optimization?

Thanks.

Qingdi
-- 
View this message in context: 
http://www.nabble.com/index-size-tripled-during-optimization-tp21691596p21691596.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Text classification with Solr

2009-01-27 Thread Karl Wettin


27 jan 2009 kl. 17.23 skrev Neal Richter:



Is it really neccessary to use Solr for it? Things going much  
faster with
Lucene low-level api and much faster if you're loading the  
classification

corpus into the RAM.


Good points.  At the moment I'd rather have a daemon with a service
API.. as well as the filtering/tokenization capabilities Solr has
built in.  Probably will attempt to get the corpus' index in memory
via large memory allocation.

If it doesn't scale then I'll either go to Lucene api or implement a
custom inverted index via memcached.

Other note /at the moment/ is that it's not going to be a deeply
hierarchical taxonomy, much less a full indexing of an RDF/OWL
schema.. there are some gotchas for that.


If your corpus is small enought you may want to take a look at lucene/ 
contrib/instantiated. It was made just for these sort of things.



karl




Re: Setting dataDir in multicore environment

2009-01-27 Thread Mark Ferguson
Oh I see, thanks for the clarification.

Unfortunately this brings me back to same problem I started with: implicit
properties aren't available when managing indexes through the REST api. I
know there is a patch in the works for this issue but I can't wait for it.
Is there any way to share the solrconfig.xml file and create indexes
dynamically?

Mark


On Mon, Jan 26, 2009 at 9:02 PM, Noble Paul നോബിള്‍ नोब्ळ् <
noble.p...@gmail.com> wrote:

> The behavior is expected
> properties set in solr.xml are not implicitly used anywhere.
> you will have to use those variables explicitly in
> solrconfig.xml/schema.xml
> instead of hardcoding dataDir in solrconfig.xml you can use it as a
> variable $$dataDir
>
> BTW there is an issue (https://issues.apache.org/jira/browse/SOLR-943)
> which helps you specify the dataDir in solr.xml
>
>
> On Tue, Jan 27, 2009 at 5:19 AM, Mark Ferguson
>  wrote:
> > Hi,
> >
> > In my solr.xml file, I am trying to set the dataDir property the way it
> is
> > described in the CoreAdmin page on the wiki:
> >
> > 
> >  
> > 
> >
> > However, the property is being completed ignored. It is using whatever I
> > have set in the solrconfig.xml file (or ./data, the default value, if I
> set
> > nothing in that file). Any idea what I am doing wrong? I am trying this
> > approach to avoid using ${solr.core.name} in the solrconfig.xml file,
> since
> > dynamic properties are broken for creating cores via the REST api.
> >
> > Mark
> >
>
>
>
> --
> --Noble Paul
>


multiple indexes

2009-01-27 Thread Jae Joo
Hi,

I would like to know how it can be implemented.

Index1 has fields id,1,2,3 and index2 has fields id,5,6,7.
The ID in both indexes are unique id.

Can I use "a kind of " distributed search and/or multicore to search, sort,
and facet through 2 indexes (index1 and index2)?

Thanks,

Jae joo


Re: Connection mismanagement in Solrj?

2009-01-27 Thread Ryan McKinley

if you use this constructor:

  public CommonsHttpSolrServer(URL baseURL, HttpClient client)

then solrj never touches the HttpClient configuration.

I normally reuse a single CommonsHttpSolrServer as well.


On Jan 27, 2009, at 9:52 AM, Walter Underwood wrote:


Making requests in parallel, using the default connection manager,
which is multi-threaded, and we are reusing a single  
CommonsHttpSolrServer

for all requests.

wunder

On 1/26/09 10:59 PM, "Noble Paul നോബിള്‍  नो 
ब्ळ्" 

wrote:


are you making requests in parallel ?
which ConnectionManager are you using for HttpClient?

On Tue, Jan 27, 2009 at 11:58 AM, Noble Paul നോബിള്‍   
नोब्ळ्

 wrote:

you can set any connection parameters for the HttpClient and pass on
the instance to CommonsHttpSolrServer and that will be used for  
making

requests

make sure that you are not reusing instance of CommonsHttpSolrServer

On Tue, Jan 27, 2009 at 10:59 AM, Walter Underwood
 wrote:
We just switched to Solrj from a home-grown client and we have a  
huge

jump in the number of connections to the server, enough that our
load balancer was rejecting connections in production tonight.

Does that sound familiar? We're running 1.3.

I set the timeouts and connection pools to the same values I'd
used in my other code, also based on HTTPClient.

We can roll back to my code temporarily, but we want some of
the Solrj facet support for a new project.

wunder






--
--Noble Paul










Re: Connection mismanagement in Solrj?

2009-01-27 Thread Yonik Seeley
That's interesting SolrJ doesn't touch HTTPClient params if one is
provided in the constructor.

I guess I'd try to sniff the headers first and see if any difference
sticks out between the clients.
I normally just use netcat and pretend to be the solr server.

-Yonik


On Tue, Jan 27, 2009 at 12:29 AM, Walter Underwood
 wrote:
> We just switched to Solrj from a home-grown client and we have a huge
> jump in the number of connections to the server, enough that our
> load balancer was rejecting connections in production tonight.
>
> Does that sound familiar? We're running 1.3.
>
> I set the timeouts and connection pools to the same values I'd
> used in my other code, also based on HTTPClient.
>
> We can roll back to my code temporarily, but we want some of
> the Solrj facet support for a new project.
>
> wunder
>
>


query with stemming, prefix and fuzzy?

2009-01-27 Thread Gert Brinkmann
Hello,

I am trying to get Solr to properly work. I have set up a Solr test
server (using jetty as mentioned in the tutorial). Also I had to modify
the schema.xml so that I have different fields for different languages
(with their own stemmers) that occur in the content management system
that I am indexing. So far everything does work fine including snippet
highlighting.

But now I am having some problems with two things:

A) fuzzy search

When trying to do a fuzzy search the analyzers seem to break up a search
string like "house~0.6" into "house", "0" and "6" so that e.g. a single
"6" is highlighted, too. So I tried to use an additional raw-field
without any stemming and just a lower case and white space analyzer.
This seems to work fine. But fuzzy query is very slow and takes 100% CPU
for several seconds with only one query at a time.

What can I do to speed up the fuzzy query? I e.g. have found a Lucene
parameter prefixLength but no according Solr option. Does this exist?
Are there some other options to pay attention to?


B) combine stemming, prefix and fuzzy search

Is there a way to combine all this three query types in one query?
Especially stemming and prefixing? I think it would be problematic as a
"house*" would be analyzed to "house" with the usual analyzers that are
required for stemming?

Do I need different query type fields and combine them with an boolean
OR in the query? Something like

  data:house OR data_fuzzy:house~0.6 OR data_prefix:house*

This feels to be a little bit circuitous. Is there a way to use
"house*~.6" including correct stemming?

Thank you,
Gert


Re: Text classification with Solr

2009-01-27 Thread Neal Richter
On Tue, Jan 27, 2009 at 1:36 AM, Hannes Carl Meyer  wrote:
> Yeah, know it, the challenge on this method is the calculation of the score
> and parametrization of thresholds.

Not as worried about score itself as the score thresholds for prediction in/out.

> Is it really neccessary to use Solr for it? Things going much faster with
> Lucene low-level api and much faster if you're loading the classification
> corpus into the RAM.

Good points.  At the moment I'd rather have a daemon with a service
API.. as well as the filtering/tokenization capabilities Solr has
built in.  Probably will attempt to get the corpus' index in memory
via large memory allocation.

If it doesn't scale then I'll either go to Lucene api or implement a
custom inverted index via memcached.

Other note /at the moment/ is that it's not going to be a deeply
hierarchical taxonomy, much less a full indexing of an RDF/OWL
schema.. there are some gotchas for that.

Thanks - Neal


Re: QParserPlugin

2009-01-27 Thread Karl Wettin

So it was me defining it in schema.xml rather than solrconfig.xml.

17:17 < erikhatcher> where are you defining the qparser plugin?
17:18 < erikhatcher> it's very odd... if it isn't picking them up but  
you reference them, it would certainly give an error
17:18 < karlwettin> as a first level child to schema element in  
schema.xml

17:19 < erikhatcher> qparser plugins go in solrconfig, not schema
17:19 < karlwettin> aha
17:19 < karlwettin> :)
17:19 < erikhatcher> :)


27 jan 2009 kl. 08.25 skrev Erik Hatcher:

Karl - where did you put your a.b.QParserPlugin?   You should put it  
in /lib within a JAR file.  I'm surprised you aren't  
seeing an error though.


Erik

On Jan 27, 2009, at 1:07 AM, Karl Wettin wrote:


Hi forum,

I'm trying to get QParserPlugin to work, I've got


but still get Unknown query type 'myqueryparser' when I
/solr/select/?defType=myqueryparser&q=foo

There is no warning about myqueryparser from Solr at startup.

I do however manage to get this working:

  
  


So it shouldn't be my Solr environment or a classpath problem?  
That's the level of me setting up Solr, I'm left with no clues to  
why it doesn't register.



gratefully,

  karl






Re: solrj delete by Id problem

2009-01-27 Thread Shalin Shekhar Mangar
On Tue, Jan 27, 2009 at 8:51 PM, Parisa  wrote:

>
> I found how the issue is created .when solr warm up the new searcher with
> cacheLists , if the queryResultCache is enable the issue is created.
>
> notice:as I mentioned before I commit with waitflush=false and
> waitsearcher=false
>
> so it has problem in case the queryResultCache is on,
>

Ah so that is the issue. The problem is that when you call commit with
waitSearcher=false and waitFlush=false, the call immediately returns without
waiting for the commit to complete and the new searcher to be registered.
Therefore any queries you make until autowarming completes does not give you
the results from the new index.

You should call commit with both waitSearcher and waitFlush as true. That
should solve the problem.

-- 
Regards,
Shalin Shekhar Mangar.


Re: solrj delete by Id problem

2009-01-27 Thread Parisa

I found how the issue is created .when solr warm up the new searcher with
cacheLists , if the queryResultCache is enable the issue is created.

notice:as I mentioned before I commit with waitflush=false and
waitsearcher=false

so it has problem in case the queryResultCache is on,

but I don't know why the issue is created only in deleteById mode and we
don't have problem when we add a doc and commit with waitflush=false and
waitsearcher=false

I think they both use the same method for warmup the new searcher !!!

there is also a comment on solrCore class  that I am concern about it:

solrCore.java

public RefCounted getSearcher(boolean forceNew, boolean
returnSearcher, final Future[] waitSearcher) throws IOException {


--


 line 1132 (nightly version)

  // warm the new searcher based on the current searcher.
  // should this go before the other event handlers or after?

 
  if (currSearcher != null) {
future = searcherExecutor.submit(
new Callable() {
  public Object call() throws Exception {
try {
  newSearcher.warm(currSearcher);
} catch (Throwable e) {
  SolrException.logOnce(log,null,e);
}
return null;
  }
}
);
  }

-
--

} 
-- 
View this message in context: 
http://www.nabble.com/solrj-delete-by-Id-problem-tp21433056p21687431.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: fastest way to index/reindex

2009-01-27 Thread Erik Hatcher
*:* will default to sorting by document insertion order (Lucene's  
document id, _not_ your Solr uniqueKey).  And no, you won't miss any  
by paging - order will be maintained.


Erik

On Jan 27, 2009, at 9:52 AM, Ian Connor wrote:

When you query by *:*, what order does it use. Is there a chance  
they will
come in a different order as you page through the results (and miss/ 
dupicate
some). Is it best to put the order explicitly by 'id' or is that  
implied

already?

On Mon, Jan 26, 2009 at 12:00 PM, Ian Connor   
wrote:


*:* took it up to 45/sec from 28/sec so a nice 60% bump in  
performance -

thanks!


On Sun, Jan 25, 2009 at 5:46 PM, Ryan McKinley   
wrote:



I don't know of any standard export/import tool -- i think luke has
something, but it will be faster if you write your own.

Rather then id:[* TO *], just try *:*  -- this should match all  
documents

without using a range query.



On Jan 25, 2009, at 3:16 PM, Ian Connor wrote:

Hi,


Given the only real way to reindex is to save the document again,  
what is
the fastest way to extract all the documents from a solr index to  
resave

them.

I have tried the id:[* TO *] trick however, it takes a while once  
you get

a
few thousand into the index. Are there any tools that will  
quickly export
the index to a text file or making queries 1000 at a time is the  
best

option
and dealing with the time it takes to query once you are deep  
into the

index?

--
Regards,

Ian Connor







--
Regards,

Ian Connor





--
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor




Re: fastest way to index/reindex

2009-01-27 Thread Ian Connor
When you query by *:*, what order does it use. Is there a chance they will
come in a different order as you page through the results (and miss/dupicate
some). Is it best to put the order explicitly by 'id' or is that implied
already?

On Mon, Jan 26, 2009 at 12:00 PM, Ian Connor  wrote:

> *:* took it up to 45/sec from 28/sec so a nice 60% bump in performance -
> thanks!
>
>
> On Sun, Jan 25, 2009 at 5:46 PM, Ryan McKinley  wrote:
>
>> I don't know of any standard export/import tool -- i think luke has
>> something, but it will be faster if you write your own.
>>
>> Rather then id:[* TO *], just try *:*  -- this should match all documents
>> without using a range query.
>>
>>
>>
>> On Jan 25, 2009, at 3:16 PM, Ian Connor wrote:
>>
>>  Hi,
>>>
>>> Given the only real way to reindex is to save the document again, what is
>>> the fastest way to extract all the documents from a solr index to resave
>>> them.
>>>
>>> I have tried the id:[* TO *] trick however, it takes a while once you get
>>> a
>>> few thousand into the index. Are there any tools that will quickly export
>>> the index to a text file or making queries 1000 at a time is the best
>>> option
>>> and dealing with the time it takes to query once you are deep into the
>>> index?
>>>
>>> --
>>> Regards,
>>>
>>> Ian Connor
>>>
>>
>>
>
>
> --
> Regards,
>
> Ian Connor
>



-- 
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor


Re: Connection mismanagement in Solrj?

2009-01-27 Thread Walter Underwood
Making requests in parallel, using the default connection manager,
which is multi-threaded, and we are reusing a single CommonsHttpSolrServer
for all requests.

wunder

On 1/26/09 10:59 PM, "Noble Paul നോബിള്‍  नोब्ळ्" 
wrote:

> are you making requests in parallel ?
> which ConnectionManager are you using for HttpClient?
> 
> On Tue, Jan 27, 2009 at 11:58 AM, Noble Paul നോബിള്‍  नोब्ळ्
>  wrote:
>> you can set any connection parameters for the HttpClient and pass on
>> the instance to CommonsHttpSolrServer and that will be used for making
>> requests
>> 
>> make sure that you are not reusing instance of CommonsHttpSolrServer
>> 
>> On Tue, Jan 27, 2009 at 10:59 AM, Walter Underwood
>>  wrote:
>>> We just switched to Solrj from a home-grown client and we have a huge
>>> jump in the number of connections to the server, enough that our
>>> load balancer was rejecting connections in production tonight.
>>> 
>>> Does that sound familiar? We're running 1.3.
>>> 
>>> I set the timeouts and connection pools to the same values I'd
>>> used in my other code, also based on HTTPClient.
>>> 
>>> We can roll back to my code temporarily, but we want some of
>>> the Solrj facet support for a new project.
>>> 
>>> wunder
>>> 
>>> 
>> 
>> 
>> 
>> --
>> --Noble Paul
>> 
> 
> 



Re: Error in Integrating JBoss 4.2 and Solr-1.3.0:

2009-01-27 Thread maveen

I am also getting the same issue. Did any one found the solution for this...
Please respond

sbutalia wrote:
> 
> I'm having the same issue.. have you had any progress with this?
> 

-- 
View this message in context: 
http://www.nabble.com/Error-in-Integrating-JBoss-4.2-and-Solr-1.3.0%3A-tp20202032p21686321.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Highlighting does not work?

2009-01-27 Thread Jarek Zgoda
Finally found that the fields have to have an analyzer to be  
highlighted. Neat.


Can I ask somebody to document these all requirements?

Wiadomość napisana w dniu 2009-01-27, o godz. 13:49, przez Jarek Zgoda:

I turned these fields to indexed + stored but the results are  
exactly the same, no matter if I search in these fields or elsewhere.


Wiadomość napisana w dniu 2009-01-27, o godz. 13:09, przez Jarek  
Zgoda:



Solr 1.3

I'm trying to get highlighting working, with no luck so far.

Query with params q=cyrus&fl=*,score&qt=standard&hl=true&hl.fl=title 
+description finds 182 documents in my index. All of the top 10  
hits contain the word "cyrus", but the highlights list is empty.  
The fields "title" and "description" are stored but not indexed. If  
I specify "*" as hl.fl value I get the same results.


Do I need to add some special configuration to enable highlighting  
feature?


--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
jarek.zg...@redefine.pl



--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
jarek.zg...@redefine.pl



--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
jarek.zg...@redefine.pl



Re: multilanguage prototype

2009-01-27 Thread Erik Hatcher

errors: 11

What were those?

My hunch is your indexer had issues.  What did Solr output into the  
console or log during indexing?


Erik

On Jan 27, 2009, at 6:56 AM, revathy arun wrote:


Hi Shalin,

The admin page stats are as follows
searcherName : searc...@1d4c3d5 main
caching : true
numDocs : 0
maxDoc : 0

*name: * /update  *class: *  
org.apache.solr.handler.XmlUpdateRequestHandler
*version: * $Revision: 690026 $  *description: * Add documents with  
XML  *

stats: *handlerStart : 1232692774389
requests : 22
errors : 11
timeouts : 0
totalTime : 1181
avgTimePerRequest : 53.68182
avgRequestsPerSecond : 6.0431463E-5

*stats: *commits : 9
autocommits : 0
optimizes : 2
docsPending : 0
adds : 0
deletesById : 0
deletesByQuery : 0
errors : 0
cumulative_adds : 0
cumulative_deletesById : 0
cumulative_deletesByQuery : 0
cumulative_errors : 0

in the solrconfg.xml i have commented this line




so the index will be created in the default data folder under solr  
home,




Thanks for ur time

regards

sujatha
On 1/27/09, Shalin Shekhar Mangar  wrote:


Are you looking for it in the right place? It is very unlikely that a
commit
happens and index is not created.

The index is usually created inside the data directory as  
configured in

your
solconfig.xml

Can you search for *:* from the solr admin page and see if  
documents are

returned?

On Tue, Jan 27, 2009 at 5:01 PM, revathy arun   
wrote:



this is the stats of my updatehandler
but i still dont see any index created
*stats: *commits : 7
autocommits : 0
optimizes : 2
docsPending : 0
adds : 0
deletesById : 0
deletesByQuery : 0
errors : 0
cumulative_adds : 0
cumulative_deletesById : 0
cumulative_deletesByQuery : 0
cumulative_errors : 0

regards

On 1/27/09, revathy arun  wrote:


Hi

I have committed.The admin page does not show any docs pending or

committed

or any errors.

Regards
Sujatha


On 1/27/09, Shalin Shekhar Mangar  wrote:


Did you commit after the updates?

2009/1/27 revathy arun 


Hi,

I have downloade solr1.3.0 .

I need to index chinese content ,for this i have defined a new  
field

in

the

schema

as




















I beleive solr1.3 already has the cjkanalyzer by default.

my schema in the testing stage has only 2 fields


required="true"

/>

stored="false"

/>




However when i index the chinese text into content , no index is

being

created.i dont see any errors in tomcat as well .

this is only entry in tomcat on updating

Jan 27, 2009 3:46:15 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/lang_prototype path=/update params={} status=0

QTime=191


I have attached the chinese text file for reference.



Regards

sujatha







--
Regards,
Shalin Shekhar Mangar.










--
Regards,
Shalin Shekhar Mangar.





Re: Highlighting does not work?

2009-01-27 Thread Jarek Zgoda
I turned these fields to indexed + stored but the results are exactly  
the same, no matter if I search in these fields or elsewhere.


Wiadomość napisana w dniu 2009-01-27, o godz. 13:09, przez Jarek Zgoda:


Solr 1.3

I'm trying to get highlighting working, with no luck so far.

Query with params q=cyrus&fl=*,score&qt=standard&hl=true&hl.fl=title 
+description finds 182 documents in my index. All of the top 10 hits  
contain the word "cyrus", but the highlights list is empty. The  
fields "title" and "description" are stored but not indexed. If I  
specify "*" as hl.fl value I get the same results.


Do I need to add some special configuration to enable highlighting  
feature?


--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
jarek.zg...@redefine.pl



--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
jarek.zg...@redefine.pl



Highlighting does not work?

2009-01-27 Thread Jarek Zgoda

Solr 1.3

I'm trying to get highlighting working, with no luck so far.

Query with params q=cyrus&fl=*,score&qt=standard&hl=true&hl.fl=title 
+description finds 182 documents in my index. All of the top 10 hits  
contain the word "cyrus", but the highlights list is empty. The fields  
"title" and "description" are stored but not indexed. If I specify "*"  
as hl.fl value I get the same results.


Do I need to add some special configuration to enable highlighting  
feature?


--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
jarek.zg...@redefine.pl



Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi Shalin,

The admin page stats are as follows
searcherName : searc...@1d4c3d5 main
caching : true
numDocs : 0
maxDoc : 0

*name: * /update  *class: * org.apache.solr.handler.XmlUpdateRequestHandler
 *version: * $Revision: 690026 $  *description: * Add documents with XML  *
stats: *handlerStart : 1232692774389
requests : 22
errors : 11
timeouts : 0
totalTime : 1181
avgTimePerRequest : 53.68182
avgRequestsPerSecond : 6.0431463E-5

*stats: *commits : 9
autocommits : 0
optimizes : 2
docsPending : 0
adds : 0
deletesById : 0
deletesByQuery : 0
errors : 0
cumulative_adds : 0
cumulative_deletesById : 0
cumulative_deletesByQuery : 0
cumulative_errors : 0

in the solrconfg.xml i have commented this line




so the index will be created in the default data folder under solr home,



Thanks for ur time

regards

sujatha
On 1/27/09, Shalin Shekhar Mangar  wrote:
>
> Are you looking for it in the right place? It is very unlikely that a
> commit
> happens and index is not created.
>
> The index is usually created inside the data directory as configured in
> your
> solconfig.xml
>
> Can you search for *:* from the solr admin page and see if documents are
> returned?
>
> On Tue, Jan 27, 2009 at 5:01 PM, revathy arun  wrote:
>
> > this is the stats of my updatehandler
> > but i still dont see any index created
> > *stats: *commits : 7
> > autocommits : 0
> > optimizes : 2
> > docsPending : 0
> > adds : 0
> > deletesById : 0
> > deletesByQuery : 0
> > errors : 0
> > cumulative_adds : 0
> > cumulative_deletesById : 0
> > cumulative_deletesByQuery : 0
> > cumulative_errors : 0
> >
> > regards
> >
> > On 1/27/09, revathy arun  wrote:
> > >
> > > Hi
> > >
> > > I have committed.The admin page does not show any docs pending or
> > committed
> > > or any errors.
> > >
> > > Regards
> > > Sujatha
> > >
> > >
> > >  On 1/27/09, Shalin Shekhar Mangar  wrote:
> > >>
> > >> Did you commit after the updates?
> > >>
> > >> 2009/1/27 revathy arun 
> > >>
> > >> > Hi,
> > >> >
> > >> > I have downloade solr1.3.0 .
> > >> >
> > >> > I need to index chinese content ,for this i have defined a new field
> > in
> > >> the
> > >> > schema
> > >> >
> > >> > as
> > >> >
> > >> >
> > >> >  > >> > positionIncrementGap="100">
> > >> >
> > >> > 
> > >> >
> > >> > 
> > >> >
> > >> > 
> > >> >
> > >> > 
> > >> >
> > >> > 
> > >> >
> > >> > 
> > >> >
> > >> > 
> > >> >
> > >> >
> > >> >
> > >> > I beleive solr1.3 already has the cjkanalyzer by default.
> > >> >
> > >> > my schema in the testing stage has only 2 fields
> > >> >
> > >> >  > >> required="true"
> > >> > />
> > >> >
> > >> >  />
> > >> >
> > >> >
> > >> >
> > >> > However when i index the chinese text into content , no index is
> being
> > >> > created.i dont see any errors in tomcat as well .
> > >> >
> > >> > this is only entry in tomcat on updating
> > >> >
> > >> > Jan 27, 2009 3:46:15 PM org.apache.solr.core.SolrCore execute
> > >> > INFO: [] webapp=/lang_prototype path=/update params={} status=0
> > >> QTime=191
> > >> >
> > >> >  I have attached the chinese text file for reference.
> > >> >
> > >> >
> > >> >
> > >> > Regards
> > >> >
> > >> > sujatha
> > >> >
> > >> >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Regards,
> > >> Shalin Shekhar Mangar.
> > >>
> > >
> > >
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: multilanguage prototype

2009-01-27 Thread Shalin Shekhar Mangar
Are you looking for it in the right place? It is very unlikely that a commit
happens and index is not created.

The index is usually created inside the data directory as configured in your
solconfig.xml

Can you search for *:* from the solr admin page and see if documents are
returned?

On Tue, Jan 27, 2009 at 5:01 PM, revathy arun  wrote:

> this is the stats of my updatehandler
> but i still dont see any index created
> *stats: *commits : 7
> autocommits : 0
> optimizes : 2
> docsPending : 0
> adds : 0
> deletesById : 0
> deletesByQuery : 0
> errors : 0
> cumulative_adds : 0
> cumulative_deletesById : 0
> cumulative_deletesByQuery : 0
> cumulative_errors : 0
>
> regards
>
> On 1/27/09, revathy arun  wrote:
> >
> > Hi
> >
> > I have committed.The admin page does not show any docs pending or
> committed
> > or any errors.
> >
> > Regards
> > Sujatha
> >
> >
> >  On 1/27/09, Shalin Shekhar Mangar  wrote:
> >>
> >> Did you commit after the updates?
> >>
> >> 2009/1/27 revathy arun 
> >>
> >> > Hi,
> >> >
> >> > I have downloade solr1.3.0 .
> >> >
> >> > I need to index chinese content ,for this i have defined a new field
> in
> >> the
> >> > schema
> >> >
> >> > as
> >> >
> >> >
> >> >  >> > positionIncrementGap="100">
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> > 
> >> >
> >> >
> >> >
> >> > I beleive solr1.3 already has the cjkanalyzer by default.
> >> >
> >> > my schema in the testing stage has only 2 fields
> >> >
> >> >  >> required="true"
> >> > />
> >> >
> >> > 
> >> >
> >> >
> >> >
> >> > However when i index the chinese text into content , no index is being
> >> > created.i dont see any errors in tomcat as well .
> >> >
> >> > this is only entry in tomcat on updating
> >> >
> >> > Jan 27, 2009 3:46:15 PM org.apache.solr.core.SolrCore execute
> >> > INFO: [] webapp=/lang_prototype path=/update params={} status=0
> >> QTime=191
> >> >
> >> >  I have attached the chinese text file for reference.
> >> >
> >> >
> >> >
> >> > Regards
> >> >
> >> > sujatha
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
> >>
> >
> >
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: multilanguage prototype

2009-01-27 Thread revathy arun
this is the stats of my updatehandler
but i still dont see any index created
*stats: *commits : 7
autocommits : 0
optimizes : 2
docsPending : 0
adds : 0
deletesById : 0
deletesByQuery : 0
errors : 0
cumulative_adds : 0
cumulative_deletesById : 0
cumulative_deletesByQuery : 0
cumulative_errors : 0

regards

On 1/27/09, revathy arun  wrote:
>
> Hi
>
> I have committed.The admin page does not show any docs pending or committed
> or any errors.
>
> Regards
> Sujatha
>
>
>  On 1/27/09, Shalin Shekhar Mangar  wrote:
>>
>> Did you commit after the updates?
>>
>> 2009/1/27 revathy arun 
>>
>> > Hi,
>> >
>> > I have downloade solr1.3.0 .
>> >
>> > I need to index chinese content ,for this i have defined a new field in
>> the
>> > schema
>> >
>> > as
>> >
>> >
>> > > > positionIncrementGap="100">
>> >
>> > 
>> >
>> > 
>> >
>> > 
>> >
>> > 
>> >
>> > 
>> >
>> > 
>> >
>> > 
>> >
>> >
>> >
>> > I beleive solr1.3 already has the cjkanalyzer by default.
>> >
>> > my schema in the testing stage has only 2 fields
>> >
>> > > required="true"
>> > />
>> >
>> > 
>> >
>> >
>> >
>> > However when i index the chinese text into content , no index is being
>> > created.i dont see any errors in tomcat as well .
>> >
>> > this is only entry in tomcat on updating
>> >
>> > Jan 27, 2009 3:46:15 PM org.apache.solr.core.SolrCore execute
>> > INFO: [] webapp=/lang_prototype path=/update params={} status=0
>> QTime=191
>> >
>> >  I have attached the chinese text file for reference.
>> >
>> >
>> >
>> > Regards
>> >
>> > sujatha
>> >
>> >
>> >
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>


Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi

I have committed.The admin page does not show any docs pending or committed
or any errors.

Regards
Sujatha


On 1/27/09, Shalin Shekhar Mangar  wrote:
>
> Did you commit after the updates?
>
> 2009/1/27 revathy arun 
>
> > Hi,
> >
> > I have downloade solr1.3.0 .
> >
> > I need to index chinese content ,for this i have defined a new field in
> the
> > schema
> >
> > as
> >
> >
> >  > positionIncrementGap="100">
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> >
> >
> > I beleive solr1.3 already has the cjkanalyzer by default.
> >
> > my schema in the testing stage has only 2 fields
> >
> >  required="true"
> > />
> >
> > 
> >
> >
> >
> > However when i index the chinese text into content , no index is being
> > created.i dont see any errors in tomcat as well .
> >
> > this is only entry in tomcat on updating
> >
> > Jan 27, 2009 3:46:15 PM org.apache.solr.core.SolrCore execute
> > INFO: [] webapp=/lang_prototype path=/update params={} status=0 QTime=191
> >
> >  I have attached the chinese text file for reference.
> >
> >
> >
> > Regards
> >
> > sujatha
> >
> >
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: multilanguage prototype

2009-01-27 Thread Shalin Shekhar Mangar
Did you commit after the updates?

2009/1/27 revathy arun 

> Hi,
>
> I have downloade solr1.3.0 .
>
> I need to index chinese content ,for this i have defined a new field in the
> schema
>
> as
>
>
>  positionIncrementGap="100">
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
>
>
> I beleive solr1.3 already has the cjkanalyzer by default.
>
> my schema in the testing stage has only 2 fields
>
>  />
>
> 
>
>
>
> However when i index the chinese text into content , no index is being
> created.i dont see any errors in tomcat as well .
>
> this is only entry in tomcat on updating
>
> Jan 27, 2009 3:46:15 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/lang_prototype path=/update params={} status=0 QTime=191
>
>  I have attached the chinese text file for reference.
>
>
>
> Regards
>
> sujatha
>
>
>



-- 
Regards,
Shalin Shekhar Mangar.


multilanguage prototype

2009-01-27 Thread revathy arun
Hi,

I have downloade solr1.3.0 .

I need to index chinese content ,for this i have defined a new field in the
schema

as




















I beleive solr1.3 already has the cjkanalyzer by default.

my schema in the testing stage has only 2 fields







However when i index the chinese text into content , no index is being
created.i dont see any errors in tomcat as well .

this is only entry in tomcat on updating

Jan 27, 2009 3:46:15 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/lang_prototype path=/update params={} status=0 QTime=191

 I have attached the chinese text file for reference.



Regards

sujatha
西尼羅河病毒)

西尼羅河病毒症﹕重要資訊
什麼是西尼羅河病毒症? 西尼羅河病毒症(WNV)是一種具有潛在嚴重後果的疾病。據專家認為,西尼羅河病毒症已在北美形成季節性流行病,夏 
季開始發病並延續到秋季。本資訊單含有重要資訊,可幫助您認識及預防西尼羅河病毒症。 採取何種措施可以防止西尼羅河病毒症(WNV)? 
避免西尼羅河病毒症的最簡單及最有效方法是避免受蚊子叮咬。 • • • • 在戶外時, 應使用含有的驅蟲劑。 請遵循包裝物上的使用說明。 
許多蚊子最喜歡出來活動的時間是傍晚和凌晨。您可考慮此時在室內活動, 或使用驅蟲劑並穿長衣長褲。 穿淺色衣服有助於察覺落在身上的蚊子。 
應確保家中的門窗有完好的紗門紗窗,以便將蚊子擋在室外。 消除蚊子孳生地﹕倒乾花盆、桶、罐中的積水,經常給寵物食碟換水,每周給鳥類沐浴池換水,給輪胎秋千鑽排水孔 
,在不使用兒童涉水池時將水倒乾並翻轉放置。 西尼羅河病毒症有何症狀? 西尼羅河病毒症影響中樞神經系統。症狀因人而異。 • 少數人會出現嚴重症狀。 
在感染西尼羅河病毒者中,每150人大約有一人會發生嚴重的病情。嚴重症狀可能包括﹕ 
高熱、頭痛、脖子僵硬、感覺遲鈍、神志迷惑、昏迷、顫抖、抽搐、肌肉無力、失明、麻木、癱瘓。這些症狀可能 持續幾周,神經性影響可能永久存在。 • 
有些人會出現輕微症狀。最多達20%的受感染者會顯示輕微症狀,其中包括發熱、頭痛、身體疼痛、惡心、嘔吐, 
有時淋巴節會腫大,或在胸部、腹部、背部出現皮疹。症狀通常持續幾天,即便健康人也會病几周。 • 大多數人沒有症狀。 
大約80%(5人中的4人)感染西尼羅河病毒的人均不會出現任何症狀。

此症如何傳染? • 受感染的蚊子。西尼羅河病毒症通常透過受感染的蚊子叮咬而感染。蚊子在叮咬受感染的鳥類之後即會成為西尼羅 河病毒的攜帶者。 
然後,受感染的蚊子可能透過叮咬而將西尼羅河病毒症傳染給人類和其它動物。

2004年8月

第1頁,共2頁

西尼羅河病毒症﹕重要資訊 (接上頁) • • 輸血、器官移植、母親對孩子傳染。在極少數病例中,西尼羅河病毒症還會透過輸血、器官移植、哺乳傳染、甚至 
由母親在懷孕期間傳染給孩子。 不會透過接觸傳染。西尼羅河病毒症不會透過偶然接觸(例如接觸或親吻)感染病毒這而傳染。

受感染後經過多長時間才會發病? 人們被受感染的蚊子叮咬後,通常經過3至14天才會出現症狀。 對感染西尼羅河病毒症者如何治療? 
對感染西尼羅河病毒症者沒有特別的治療方法。症狀輕微者會出現發熱和疼痛現象,不久即會自動消失。症狀較嚴重者通常 
需要前往醫院,接受支援性治療,其中包括輸液,呼吸輔助,護理服務。 如果我感到自己患有西尼羅河病毒症,應該怎麼辦? 
西尼羅河病毒症的輕微患者會自動康復,因此感染此病毒者不一定需要就醫。如果您出現嚴重的西尼羅河病毒症狀, 
例如不尋常的頭痛或神志迷惑,就應當立即就醫。西尼羅河病毒症的嚴重患者通常需要住院治療。如果孕婦或正在哺乳的母 
親出現可能是西尼羅河病毒症的症狀,我們鼓勵您告訴醫生。 感染西尼羅河病毒症的風險有多大? 
年齡在50歲以上者可能病情較嚴重。年齡在50歲以上者如果出現西尼羅河病毒症狀,則可能症狀較為嚴重,因此應特別注 意避免受蚊子叮咬。 
經常在戶外活動者風險較大。長時間在戶外活動者較有可能被受感染的蚊子叮咬。如果長時間在戶外工作或玩耍,應特別 注意避免受蚊子叮咬。 
在醫療過程中患病的風險很小。手術用血在使用前要經過西尼羅河病毒的檢測。透過輸血及器官移植而感染西尼羅河病毒 
症的風險很小,因此需要動手術者不應因此病毒而不動手術。 如果您對此有疑慮, 應在手術前與醫生商議。 
懷孕和哺乳不會增高感染西尼羅河病毒症的風險。西尼羅河病毒症對胎兒或經哺乳傳染給幼兒的風險正在評估中。請向您 的醫生表明您的顧慮。 
CDC正在針對西尼羅河病毒症採取何種措施? CDC正在與各州和地方衛生部門、美國食品及藥品管理局、其它政府機構、以及私營企業界合作,共同準備治療及預防新 
發生的西尼羅河病毒症病例。

CDC正在採取某些措施,其中包括﹕
• 協辦全國範圍的電子資料庫,供各州交流關於西尼羅河病毒症的資訊 2004年8月 第2頁,共2頁

西尼羅河病毒症﹕重要資訊 (接上頁) • • • • 幫助各州制定和執行經過改進的蚊子預防和控制計劃 開發更好、更快的試驗方法, 
用以發現及診斷西尼羅河病毒症 為媒體、公眾、醫療專業人士設立新的教育工具和計劃 開辦新的西尼羅河病毒症化驗室

我還需要知道哪些事項? 如果發現死鳥﹕ 請勿空手接觸死鳥。應與當地衛生部門聯絡,詢問如何報告及處理死鳥。

若想瞭解更多資訊,請瀏覽致電CDC公眾答復專線﹕ (888) 246-2675 (英文), (888) 246-2857 (西班牙), or (866) 
874-2646 (打字電話)

2004年8月

第2頁,共2頁



Re: Text classification with Solr

2009-01-27 Thread Hannes Carl Meyer
>>Instead of indexing documents about 'sports' and searching for hits
>>based upon 'basketball', 'football' etc.. I simply want to index the
>>taxonomy and classify documents into it.  This is a an ancient
>>AI/Data-Mining discipline.. but the standard methods of 'indexing' the
>>taxonomy are/were primitive compared to what one /could/ do with
>>something like Lucene.
Yeah, know it, the challenge on this method is the calculation of the score
and parametrization of thresholds.

Is it really neccessary to use Solr for it? Things going much faster with
Lucene low-level api and much faster if you're loading the classification
corpus into the RAM.

On Mon, Jan 26, 2009 at 7:24 PM, Neal Richter  wrote:

> Thanks for the link Shalin... played with that a while back.. It's
> possibly got some indirect possibilities.
>
> On Mon, Jan 26, 2009 at 10:46 AM, Hannes Carl Meyer 
> wrote:
> > I didn't understand, is the corpus of documents you want to use to
> classify
> > fix?
>
> Assume the 'documents' are not stored in the same index and I want to
> only store the taxonomy or ontology in this index.
>
> Instead of indexing documents about 'sports' and searching for hits
> based upon 'basketball', 'football' etc.. I simply want to index the
> taxonomy and classify documents into it.  This is a an ancient
> AI/Data-Mining discipline.. but the standard methods of 'indexing' the
> taxonomy are/were primitive compared to what one /could/ do with
> something like Lucene.
>
> Here's a 2007 research paper that used Lucene directly for
> classification, but doing the inverse of what I described:
> http://www.cs.ucl.ac.uk/staff/R.Hirsch/papers/gecco_HHS.pdf
>
> >>>previously suggested procedure of 1) store document 2) execute
> >>>more-like-this and 3) delete document would be too slow.
> > Do you mean the document to classify?
> > Why do you then want to put it into the index (very expensive), you just
> > need the contents of it to build a query!
>
> Exactly.. in the December Taxonomy thread Walter Underwood outlined a
> store/classify/delete procedure.  Too slow if you have no need to
> index the document itself.
>
> - Neal
>