date:20081218

Re: Standard request with functional query

2008-12-18 Thread Chris Hostetter


: Thanks for the response, but how would make recency a factor on
: scoring documents with the standard request handler.
: The query (title:iphone OR bodytext:iphone OR title:firmware OR
: bodytext:firmware) AND _val_:ord(dateCreated)^0.1
: seems to do something very similar to just sorting by dateCreated
: rather than having dateCreated being a part of the score.

you have to look at the score explanations (debugQuery=true) and decide 
what boost is appropriate.  there are no magic numbers that work for 
everyone.

: 
: Thanks,
: Sammy
: 
: n Thu, Dec 4, 2008 at 1:35 PM, Sammy Yu temi...@gmail.com wrote:
:  Hi guys,
: I have a standard query that searches across multiple text fields such as
:  q=title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware
: 
:  This comes back with documents that have iphone and firmware (I know I
:  can use dismax handler but it seems to be really slow), which is
:  great.  Now I want to give some more weight to more recent documents
:  (there is a dateCreated field in each document).
: 
:  So I've modified the query as such:
:  (title:iphone OR bodytext:iphone OR title:firmware OR
:  bodytext:firmware) AND _val_:ord(dateCreated)^0.1
:  URLencoded to 
q=(title%3Aiphone+OR+bodytext%3Aiphone+OR+title%3Afirmware+OR+bodytext%3Afirmware)+AND+_val_%3Aord(dateCreated)^0.1
: 
:  However, the results are not as one would expects.  The first few
:  documents only come back with the word iphone and appears to be sorted
:  by date created.  It seems to completely ignore the score and use the
:  dateCreated field for the score.
: 
:  On a not directly related issue it seems like if you put the weight
:  within the double quotes:
:  (title:iphone OR bodytext:iphone OR title:firmware OR
:  bodytext:firmware) AND _val_:ord(dateCreated)^0.1
: 
:  the parser complains:
:  org.apache.lucene.queryParser.ParseException: Cannot parse
:  '(title:iphone OR bodytext:iphone OR title:firmware OR
:  bodytext:firmware) AND _val_:ord(dateCreated)^0.1': Expected ',' at
:  position 16 in 'ord(dateCreated)^0.1'
: 
:  Thanks,
:  Sammy
: 
: 



-Hoss

Solrj - Exception in thread main java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.solr.common.util.NamedList

2008-12-18 Thread Sajith Vimukthi

Hi all,

 

I used the sample code given below and tried to run with all the relevant
jars. I receive the exception written below.

 

package test.general;

 

import org.apache.solr.client.solrj.SolrServer;

import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.SolrQuery;

import org.apache.solr.client.solrj.response.UpdateResponse;

import org.apache.solr.client.solrj.response.QueryResponse;

import org.apache.solr.client.solrj.response.FacetField;

import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;

import org.apache.solr.common.SolrInputDocument;

import org.apache.solr.common.params.SolrParams;

 

 

 

import java.io.IOException;

import java.util.Collection;

import java.util.HashSet;

import java.util.Random;

import java.util.List;

 

/**

 * Connect to Solr and issue a query

 */

public class SolrJExample {

 

  public static final String [] CATEGORIES = {a, b, c, d};

 

  public static void main(String[] args) throws IOException,
SolrServerException {

SolrServer server = new
CommonsHttpSolrServer(http://localhost:8080/solr/update;);

Random rand = new Random();



 

//Index some documents

CollectionSolrInputDocument docs = new HashSetSolrInputDocument();

for (int i = 0; i  10; i++) {

  SolrInputDocument doc = new SolrInputDocument();

  doc.addField(link, http://non-existent-url.foo/; + i + .html);

  doc.addField(source, Blog # + i);

  doc.addField(source-link, http://non-existent-url.foo/index.html;);

  doc.addField(subject, Subject:  + i);

  doc.addField(title, Title:  + i);

  doc.addField(content, This is the  + i + (th|nd|rd) piece of
content.);

  doc.addField(category, CATEGORIES[rand.nextInt(CATEGORIES.length)]);

  doc.addField(rating, i);

  System.out.println(Doc[ + i + ] is  + doc);

  docs.add(doc);

}



UpdateResponse response = server.add(docs);

System.out.println(Response:  + response);

//Make the documents available for search

server.commit();

//create the query

SolrQuery query = new SolrQuery(content:piece);

//indicate we want facets

query.setFacet(true);

//indicate what field to facet on

query.addFacetField(category);

//we only want facets that have at least one entry

query.setFacetMinCount(1);

//run the query

QueryResponse results = server.query(query);

System.out.println(Query Results:  + results);

//print out the facets

ListFacetField facets = results.getFacetFields();

for (FacetField facet : facets) {

  System.out.println(Facet: + facet);

}

 

 

  }

 

}

 

 

The exception :

 

Exception in thread main java.lang.ClassCastException: java.lang.Long
cannot be cast to org.apache.solr.common.util.NamedList

  at
org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89)

  at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(Binar
yResponseParser.java:39)

  at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
olrServer.java:385)

  at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
olrServer.java:183)

  at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.jav
a:217)

  at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)

  at test.general.SolrJExample.main(SolrJExample.java:48)

 

 

Can someone help me out.

 

Regards,

Sajith Vimukthi Weerakoon

Associate Software Engineer | ZONE24X7

| Tel: +94 11 2882390 ext 101 | Fax: +94 11 2878261 |

http://www.zone24x7.com

Solrj - Exception in thread main java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.solr.common.util.NamedList

2008-12-18 Thread Sajith Vimukthi

 

Hi all,

 

I used the sample code given below and tried to run with all the relevant
jars. I receive the exception written below.

 

package test.general;

 

import org.apache.solr.client.solrj.SolrServer;

import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.SolrQuery;

import org.apache.solr.client.solrj.response.UpdateResponse;

import org.apache.solr.client.solrj.response.QueryResponse;

import org.apache.solr.client.solrj.response.FacetField;

import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;

import org.apache.solr.common.SolrInputDocument;

import org.apache.solr.common.params.SolrParams;

 

 

 

import java.io.IOException;

import java.util.Collection;

import java.util.HashSet;

import java.util.Random;

import java.util.List;

 

/**

 * Connect to Solr and issue a query

 */

public class SolrJExample {

 

  public static final String [] CATEGORIES = {a, b, c, d};

 

  public static void main(String[] args) throws IOException,
SolrServerException {

SolrServer server = new
CommonsHttpSolrServer(http://localhost:8080/solr/update;);

Random rand = new Random();



 

//Index some documents

CollectionSolrInputDocument docs = new HashSetSolrInputDocument();

for (int i = 0; i  10; i++) {

  SolrInputDocument doc = new SolrInputDocument();

  doc.addField(link, http://non-existent-url.foo/; + i + .html);

  doc.addField(source, Blog # + i);

  doc.addField(source-link, http://non-existent-url.foo/index.html;);

  doc.addField(subject, Subject:  + i);

  doc.addField(title, Title:  + i);

  doc.addField(content, This is the  + i + (th|nd|rd) piece of
content.);

  doc.addField(category, CATEGORIES[rand.nextInt(CATEGORIES.length)]);

  doc.addField(rating, i);

  System.out.println(Doc[ + i + ] is  + doc);

  docs.add(doc);

}



UpdateResponse response = server.add(docs);

System.out.println(Response:  + response);

//Make the documents available for search

server.commit();

//create the query

SolrQuery query = new SolrQuery(content:piece);

//indicate we want facets

query.setFacet(true);

//indicate what field to facet on

query.addFacetField(category);

//we only want facets that have at least one entry

query.setFacetMinCount(1);

//run the query

QueryResponse results = server.query(query);

System.out.println(Query Results:  + results);

//print out the facets

ListFacetField facets = results.getFacetFields();

for (FacetField facet : facets) {

  System.out.println(Facet: + facet);

}

 

 

  }

 

}

 

 

The exception :

 

Exception in thread main java.lang.ClassCastException: java.lang.Long
cannot be cast to org.apache.solr.common.util.NamedList

  at
org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89)

  at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(Binar
yResponseParser.java:39)

  at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
olrServer.java:385)

  at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
olrServer.java:183)

  at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.jav
a:217)

  at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)

  at test.general.SolrJExample.main(SolrJExample.java:48)

 

 

Can someone help me out.

 

Regards,

Sajith Vimukthi Weerakoon

Associate Software Engineer | ZONE24X7

| Tel: +94 11 2882390 ext 101 | Fax: +94 11 2878261 |

http://www.zone24x7.com

Re: date facets doubt

2008-12-18 Thread Marc Sturlese


has anyone experienced this problem?
Can't find an explanation...

Thanks in advance


Marc Sturlese wrote:
 
 Hey there,
 
 1.- I am trying to use date facets but I am facing a trouble. I want to
 use the same field to do 2 facet classification. I want to show the count
 of the docs of the last week and the counts od the docs of the last month.
 What I am doing is:
 
 !-- Docs indexed last week --
   str name=facet.datesource_date/str
   str name=facet.date.startNOW/DAY-1MONTH/str
   str name=facet.date.endNOW/DAY/str
   str name=facet.date.gap+1MONTH/str
 
 !-- Docs inserted last month --
   str name=facet.datesource_date/str
   str name=facet.date.startNOW/DAY-7DAY/str
   str name=facet.date.endNOW/DAY/str
   str name=facet.date.gap+7DAY/str
 
 What i am getting as result is 2 facect result that are exactly the same
 (the result is just the first facet showed two times)
 
 lst name=facet_dates
 lst name=source_date
 int name=2008-12-10T00:00:00Z45/int
 str name=gap+1MONTH/str
 date name=end2008-12-17T00:00:00Z/date
 /lst
 lst name=source_date
 int name=2008-12-10T00:00:00Z45/int
 str name=gap+1MONTH/str
 date name=end2008-12-17T00:00:00Z/date
 /lst
 /lst
 
 I supose I am doing somenthing wrong in the sintax... any advice?
 Thanks in advance
 
 

-- 
View this message in context: 
http://www.nabble.com/date-facets-doubt-tp21050107p21069438.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solrj - Exception in thread main java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.solr.common.util.NamedList

2008-12-18 Thread Noble Paul നോബിള്‍ नोब्ळ्

which version of the server are you using? SolrJ documenttaion says
that the binary format works only with Solr1.3

On Thu, Dec 18, 2008 at 2:49 PM, Sajith Vimukthi saji...@zone24x7.com wrote:


 Hi all,



 I used the sample code given below and tried to run with all the relevant
 jars. I receive the exception written below.



 package test.general;



 import org.apache.solr.client.solrj.SolrServer;

 import org.apache.solr.client.solrj.SolrServerException;

 import org.apache.solr.client.solrj.SolrQuery;

 import org.apache.solr.client.solrj.response.UpdateResponse;

 import org.apache.solr.client.solrj.response.QueryResponse;

 import org.apache.solr.client.solrj.response.FacetField;

 import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;

 import org.apache.solr.common.SolrInputDocument;

 import org.apache.solr.common.params.SolrParams;







 import java.io.IOException;

 import java.util.Collection;

 import java.util.HashSet;

 import java.util.Random;

 import java.util.List;



 /**

  * Connect to Solr and issue a query

  */

 public class SolrJExample {



  public static final String [] CATEGORIES = {a, b, c, d};



  public static void main(String[] args) throws IOException,
 SolrServerException {

SolrServer server = new
 CommonsHttpSolrServer(http://localhost:8080/solr/update;);

Random rand = new Random();





//Index some documents

CollectionSolrInputDocument docs = new HashSetSolrInputDocument();

for (int i = 0; i  10; i++) {

  SolrInputDocument doc = new SolrInputDocument();

  doc.addField(link, http://non-existent-url.foo/; + i + .html);

  doc.addField(source, Blog # + i);

  doc.addField(source-link, http://non-existent-url.foo/index.html;);

  doc.addField(subject, Subject:  + i);

  doc.addField(title, Title:  + i);

  doc.addField(content, This is the  + i + (th|nd|rd) piece of
 content.);

  doc.addField(category, CATEGORIES[rand.nextInt(CATEGORIES.length)]);

  doc.addField(rating, i);

  System.out.println(Doc[ + i + ] is  + doc);

  docs.add(doc);

}



UpdateResponse response = server.add(docs);

System.out.println(Response:  + response);

//Make the documents available for search

server.commit();

//create the query

SolrQuery query = new SolrQuery(content:piece);

//indicate we want facets

query.setFacet(true);

//indicate what field to facet on

query.addFacetField(category);

//we only want facets that have at least one entry

query.setFacetMinCount(1);

//run the query

QueryResponse results = server.query(query);

System.out.println(Query Results:  + results);

//print out the facets

ListFacetField facets = results.getFacetFields();

for (FacetField facet : facets) {

  System.out.println(Facet: + facet);

}





  }



 }





 The exception :



 Exception in thread main java.lang.ClassCastException: java.lang.Long
 cannot be cast to org.apache.solr.common.util.NamedList

  at
 org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89)

  at
 org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(Binar
 yResponseParser.java:39)

  at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
 olrServer.java:385)

  at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
 olrServer.java:183)

  at
 org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.jav
 a:217)

  at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)

  at test.general.SolrJExample.main(SolrJExample.java:48)





 Can someone help me out.



 Regards,

 Sajith Vimukthi Weerakoon

 Associate Software Engineer | ZONE24X7

 | Tel: +94 11 2882390 ext 101 | Fax: +94 11 2878261 |

 http://www.zone24x7.com







-- 
--Noble Paul

[SolrJ] SolrException: missing content stream

2008-12-18 Thread Gunnar Wagenknecht

Hi,

I'm using SolrJ to index a couple of documents. I do this in batches of
50 docs to safe some machine memory. I call SolrServer#add(Collection)
for each batch.

For some reason, I get the following exception:
org.apache.solr.common.SolrException: missing content stream
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:114)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:147)
at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)

Any ideas what could be the issue? It actually worked fine when I added
only one doc at a time.

-Gunnar

-- 
Gunnar Wagenknecht
gun...@wagenknecht.org
http://wagenknecht.org/

Multi language search help

2008-12-18 Thread Sujatha Arun

Hi,
I am prototyping lanuage search using solr 1.3 .I  have 3 fields in the
schema -id,content and language.

I am indexing 3 pdf files ,the languages are foroyo,chinese and japanese.

I use xpdf to convert the content of pdf to text and push the text to solr
in the content field.

What is the analyzer  that i need to use for the above.

By using the default text analyzer and posting this content to solr, i am
not getting any  results.

Does solr support stemming for the above languages.

Regards
Sujatha

Re: [SolrJ] SolrException: missing content stream

2008-12-18 Thread Ryan McKinley


are you sure the Collection is not empty?
what version are you running?
what do the server logs say when you get this error on the client?

On Dec 18, 2008, at 6:42 AM, Gunnar Wagenknecht wrote:


Hi,

I'm using SolrJ to index a couple of documents. I do this in batches  
of

50 docs to safe some machine memory. I call SolrServer#add(Collection)
for each batch.

For some reason, I get the following exception:
org.apache.solr.common.SolrException: missing content stream
at
org 
.apache 
.solr 
.handler 
.XmlUpdateRequestHandler 
.handleRequestBody(XmlUpdateRequestHandler.java:114)

at
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at
org 
.apache 
.solr 
.client 
.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java: 
147)

at
org 
.apache 
.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java: 
217)

at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)

Any ideas what could be the issue? It actually worked fine when I  
added

only one doc at a time.

-Gunnar

--
Gunnar Wagenknecht
gun...@wagenknecht.org
http://wagenknecht.org/

Change in config file (synonym.txt) requires container restart?

2008-12-18 Thread Sagar Khetkade


Hi,
 
I am using SolrJ client to connect to the Solr 1.3 server and the whole POC 
(doing a feasibility study ) reside in Tomcat web server. If any change I am 
making in the synonym.txt file to add the synonym in the file to make it 
reflect I have to restart the tomcat server. The synonym filter factory that I 
am using are in both in analyzers for type index and query in schema.xml. 
Please tell me whether this approach is good or any other way to make the 
change reflect while searching without restarting of tomcat server.
 
Thanks and Regards,
Sagar Khetkade
_
Chose your Life Partner? Join MSN Matrimony FREE
http://in.msn.com/matrimony

Re: Change in config file (synonym.txt) requires container restart?

2008-12-18 Thread Mark Miller


Sagar Khetkade wrote:

Hi,
 
I am using SolrJ client to connect to the Solr 1.3 server and the whole POC (doing a feasibility study ) reside in Tomcat web server. If any change I am making in the synonym.txt file to add the synonym in the file to make it reflect I have to restart the tomcat server. The synonym filter factory that I am using are in both in analyzers for type index and query in schema.xml. Please tell me whether this approach is good or any other way to make the change reflect while searching without restarting of tomcat server.
 
Thanks and Regards,

Sagar Khetkade
_
Chose your Life Partner? Join MSN Matrimony FREE
http://in.msn.com/matrimony
  

You can also reload the core.

- Mark

Re: Get All terms from all documents

2008-12-18 Thread Erick Erickson

I think I'd pin the user down and have him give me the real-world
use-cases that require this, then see if there's a more reasonable
 way to satisfy that use-case. Do they want type-ahead? What
is the user of the system going to see? Because, for instance,
a drop-down of 10,000 terms is totally useless.

Best
Erick

On Wed, Dec 17, 2008 at 10:02 PM, roberto miles.c...@gmail.com wrote:

 Grant

 It completely crazy do something like this i know, but the customer want´s,
 i´m really trying to figure out how to do it in a better way, maybe using
 the (auto suggest) filter from solr 1.3 to get all the words starting with
 some letter and cache the letter in the client side, out client is going to
 be write in swing, what do you guys think?

 Thanks,

 On Wed, Dec 17, 2008 at 8:05 PM, Grant Ingersoll gsing...@apache.org
 wrote:

  All terms from all docs?  Really?
 
  At any rate, see http://wiki.apache.org/solr/TermsComponent  May need a
  mod to not require any field, but for now you can enter all fields (which
  you can get from LukeRequestHandler)
 
  -Grant
 
 
 
  On Dec 17, 2008, at 2:17 PM, roberto wrote:
 
  Hello,
 
  I need to get all terms from all documents to be placed in my interface
  almost like the facets, how can i do it?
 
  thanks
 
  --
  Without love, we are birds with broken wings.
  Morrie
 
 
  --
  Grant Ingersoll
 
  Lucene Helpful Hints:
  http://wiki.apache.org/lucene-java/BasicsOfPerformance
  http://wiki.apache.org/lucene-java/LuceneFAQ
 
 
 
 
 
 
 
 
 
 
 


 --
 Without love, we are birds with broken wings.
 Morrie

Highlighting broken? String index out of range: 35

2008-12-18 Thread Steffen B.


Hi everyone,
it seems that I've run into another problem with my Solr setup. :/ The
highlighter just won't highlight anything, no matter which fragmenter or
config params I use.
Here's an example, taken straight out of the example solrconfig.xml:
requestHandler name=dismax class=solr.SearchHandler 
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qf
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 /str
 str name=pf
text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
 /str
 str name=bf
ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3
 /str
 str name=fl
id,name,price,score
 /str
 str name=mm
2lt;-1 5lt;-2 6lt;90%
 /str
 int name=ps100/int
 str name=q.alt*:*/str
 !-- example highlighter config, enable per-query with hl=true --
 str name=hl.fltext features name/str
 !-- for this field, we want no fragmenting, just highlighting --
 str name=f.name.hl.fragsize0/str
 !-- instructs Solr to return the field itself if no query terms are
  found --
 str name=f.name.hl.alternateFieldname/str
 str name=f.text.hl.fragmenterregex/str !-- defined below --
/lst
  /requestHandler

Whenever I try to activate the highlighter, it produces an error:
http://localhost:8983/solr/select/?q=ipodversion=2.2start=0rows=10indent=onqt=dismaxhl=true

HTTP ERROR: 500

String index out of range: 35

java.lang.StringIndexOutOfBoundsException: String index out of range: 35
at java.lang.String.substring(Unknown Source)
at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:239)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:310)
at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:83)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:171)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

That's what happens with the example setup - on my project it simply won't
highlight anything at all, no matter what I try. :| Can anyone shed some
light on this?
-- 
View this message in context: 
http://www.nabble.com/Highlighting-broken--String-index-out-of-range%3A-35-tp21073102p21073102.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting broken? String index out of range: 35

2008-12-18 Thread Steffen B.


Alright, I pinned it down, I think...
The cause of the error seems to be the features field, which has
termVectors=true, termPositions=true and termOffsets=true. The other 2
fields (name and text) work, they have the same type but lack the
term*-attributes. When you overwrite the default hl.fl with something like
name text it works, but add features to it and you get the error.


Steffen B. wrote:
 
 Hi everyone,
 it seems that I've run into another problem with my Solr setup. :/ The
 highlighter just won't highlight anything, no matter which fragmenter or
 config params I use.
 Here's an example, taken straight out of the example solrconfig.xml:
 requestHandler name=dismax class=solr.SearchHandler 
 lst name=defaults
  str name=defTypedismax/str
  str name=echoParamsexplicit/str
  float name=tie0.01/float
  str name=qf
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
  /str
  str name=pf
 text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
  /str
  str name=bf
 ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3
  /str
  str name=fl
 id,name,price,score
  /str
  str name=mm
 2lt;-1 5lt;-2 6lt;90%
  /str
  int name=ps100/int
  str name=q.alt*:*/str
  !-- example highlighter config, enable per-query with hl=true --
  str name=hl.fltext features name/str
  !-- for this field, we want no fragmenting, just highlighting --
  str name=f.name.hl.fragsize0/str
  !-- instructs Solr to return the field itself if no query terms are
   found --
  str name=f.name.hl.alternateFieldname/str
  str name=f.text.hl.fragmenterregex/str !-- defined below --
 /lst
   /requestHandler
 
 Whenever I try to activate the highlighter, it produces an error:
 http://localhost:8983/solr/select/?q=ipodversion=2.2start=0rows=10indent=onqt=dismaxhl=true
 
 HTTP ERROR: 500
 
 String index out of range: 35
 
 java.lang.StringIndexOutOfBoundsException: String index out of range: 35
   at java.lang.String.substring(Unknown Source)
   at
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:239)
   at
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:310)
   at
 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:83)
   at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:171)
   at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313)
   at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
   at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
   at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
   at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
   at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
   at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
   at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
   at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
   at org.mortbay.jetty.Server.handle(Server.java:285)
   at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
   at
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
   at
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
   at
 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 
 That's what happens with the example setup - on my project it simply won't
 highlight anything at all, no matter what I try. :| Can anyone shed some
 light on this?
 

-- 
View this message in context: 
http://www.nabble.com/Highlighting-broken--String-index-out-of-range%3A-35-tp21073102p21073356.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr openning many threads

2008-12-18 Thread Alexander Ramos Jardim

Hello,

I can see from a thread dump that Solr opens a lot of threads.

How does Solr use these threads? Does exist more than one thread for search
in Solr? Does Solr use any type of workManager or are the threads simple
java.lang.Thread ? How many concurrent threads does Solr create? How does it
manage them?

-- 
Alexander Ramos Jardim

Re: Highlighting broken? String index out of range: 35

2008-12-18 Thread Koji Sekiguchi


I think you are facing this problem:

https://issues.apache.org/jira/browse/SOLR-925

I'm just looking the issue to solve it, I'm not sure that I can fix it 
in my time, though...


Koji

Steffen B. wrote:

Hi everyone,
it seems that I've run into another problem with my Solr setup. :/ The
highlighter just won't highlight anything, no matter which fragmenter or
config params I use.
Here's an example, taken straight out of the example solrconfig.xml:
requestHandler name=dismax class=solr.SearchHandler 
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qf
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 /str
 str name=pf
text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
 /str
 str name=bf
ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3
 /str
 str name=fl
id,name,price,score
 /str
 str name=mm
2lt;-1 5lt;-2 6lt;90%
 /str
 int name=ps100/int
 str name=q.alt*:*/str
 !-- example highlighter config, enable per-query with hl=true --
 str name=hl.fltext features name/str
 !-- for this field, we want no fragmenting, just highlighting --
 str name=f.name.hl.fragsize0/str
 !-- instructs Solr to return the field itself if no query terms are
  found --
 str name=f.name.hl.alternateFieldname/str
 str name=f.text.hl.fragmenterregex/str !-- defined below --
/lst
  /requestHandler

Whenever I try to activate the highlighter, it produces an error:
http://localhost:8983/solr/select/?q=ipodversion=2.2start=0rows=10indent=onqt=dismaxhl=true

HTTP ERROR: 500

String index out of range: 35

java.lang.StringIndexOutOfBoundsException: String index out of range: 35
at java.lang.String.substring(Unknown Source)
at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:239)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:310)
at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:83)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:171)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

That's what happens with the example setup - on my project it simply won't
highlight anything at all, no matter what I try. :| Can anyone shed some
light on this?

Problem in Date Format in Solr 1.3

2008-12-18 Thread rohit arora



Hi

I have upgraded from solr lucene 1.2 to solr lucene 1.3. I have coppied
all the fieldtype tags in types tag of   schema.xml from the 
solr 1.2 to solr 1.3 it gives an error..

SEVERE: org.apache.solr.common.SolrException: Invalid Date in Date Math 
String:'2006-Oct-10T10:06:13Z'

can you help me in this problem.

with regards
   Rohit Arora

Re: Solr openning many threads

2008-12-18 Thread Yonik Seeley

On Thu, Dec 18, 2008 at 9:03 AM, Alexander Ramos Jardim
alexander.ramos.jar...@gmail.com wrote:
 I can see from a thread dump that Solr opens a lot of threads.

 How does Solr use these threads? Does exist more than one thread for search
 in Solr? Does Solr use any type of workManager or are the threads simple
 java.lang.Thread ? How many concurrent threads does Solr create? How does it
 manage them?

Unless distributed search is being used, Solr currently has one single
thread executor for background warming.
There is a thread-per-request, but that's just the way servlet
containers work (Jetty, Tomcat, etc)
You can control the max number of threads that are created in the
servlet container config.

-Yonik

Solr and Autocompletion

2008-12-18 Thread Kashyap, Raghu

Hi,

  One of things we are looking for is to Autofill the keywords when people 
start typing. (e.g. Google autofill)

Currently we are using the RangeQuery. I read about the PrefixQuery and feel 
that it might be appropriate for this kind of implementation.

Has anyone implemented the autofill feature? If so what do you recommend?

Thanks,
Raghu

RE: looking for multilanguage indexing best practice/hint

2008-12-18 Thread Daniel Alheiros

Hi Sujatha.

I've developed a search system for 6 different languages and as it was
implemented on Solr 1.2 all those languages are part of the same index,
using different fields for each so I can have different analyzers for
each one.

Like:
content_chinese
content_english
content_russian
content_arabic

I've also defined a language field that I use to be able to separate
those on query time.

As you are going to implement it using Solr 1.3 I would rather create
one core per language and keep my schema simpler without the _language
suffix. Each schema (one per language) would have only, say, content
which depending on its language will use a proper analyzer and filters.

Having a separate core per language is also good as the scores for a
language won't be affected by the indexing of documents in other
languages.

Do you have any requirement for searching in any language, say q=test
and this term should be found in any language? If so, you may think of
distributed search to combine your results or even to take the same
approach I've taken as I couldn't use multi-core.

I'm also using the Dismax request handler, that's worth to have a look
so you can pre-define some base query parts and also do score boosting
behind the scenes.

I hope it helps.

Regards,
Daniel 

-Original Message-
From: Sujatha Arun [mailto:suja.a...@gmail.com] 
Sent: 18 December 2008 04:15
To: solr-user@lucene.apache.org
Subject: Re: looking for multilanguage indexing best practice/hint

Hi,

I am prototyping lanuage search using solr 1.3 .I  have 3 fields in the
schema -id,content and language.

I am indexing 3 pdf files ,the languages are foroyo,chinese and
japanese.

I use xpdf to convert the content of pdf to text and push the text to
solr in the content field.

What is the analyzer  that i need to use for the above.

By using the default text analyzer and posting this content to solr, i
am not getting any  results.

Does solr support stemmin for the above languages.

Regards
Sujatha




On 12/18/08, Feak, Todd todd.f...@smss.sony.com wrote:

 Don't forget to consider scaling concerns (if there are any). There 
 are strong differences in the number of searches we receive for each 
 language. We chose to create separate schema and config per language 
 so that we can throw servers at a particular language (or set of 
 languages) if we needed to. We see 2 orders of magnitude difference 
 between our most popular language and our least popular.

 -Todd Feak

 -Original Message-
 From: Julian Davchev [mailto:j...@drun.net]
 Sent: Wednesday, December 17, 2008 11:31 AM
 To: solr-user@lucene.apache.org
 Subject: looking for multilanguage indexing best practice/hint

 Hi,
 From my study on solr and lucene so far it seems that I will use 
 single scheme.at least don't see scenario where I'd need more than
that.
 So question is how do I approach multilanguage indexing and multilang 
 searching. Will it really make sense for just searching word..or 
 rather I should supply lang param to search as well.

 I see there are those filters and already advised on them but I guess 
 question is more of a best practice.
 solr.ISOLatin1AccentFilterFactory, solr.SnowballPorterFilterFactory

 So solution I see is using copyField I have same field in different 
 langs or something using distinct filter.
 Cheers





http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Re: Solr and Autocompletion

2008-12-18 Thread Ryan McKinley


lots of options out there

Rather then doing a slow query like Prefix, i think its best to index  
the ngrams so the autocomplete is a fast query.


http://www.mail-archive.com/solr-user@lucene.apache.org/msg06776.html



On Dec 18, 2008, at 11:56 AM, Kashyap, Raghu wrote:


Hi,

 One of things we are looking for is to Autofill the keywords when  
people start typing. (e.g. Google autofill)


Currently we are using the RangeQuery. I read about the PrefixQuery  
and feel that it might be appropriate for this kind of implementation.


Has anyone implemented the autofill feature? If so what do you  
recommend?


Thanks,
Raghu

Re: looking for multilanguage indexing best practice/hint

2008-12-18 Thread Chris Hostetter


: Subject: looking for multilanguage indexing best practice/hint
: References: 49483388.8030...@drun.net   
: 502b8706-828b-4eaa-886d-af0dccf37...@stylesight.com
: 8c0c601f0812170825j766cf005i9546b2604a19f...@mail.gmail.com
: In-Reply-To: 8c0c601f0812170825j766cf005i9546b2604a19f...@mail.gmail.com

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking



-Hoss

Re: Solr and Autocompletion

2008-12-18 Thread Chris Hostetter


: Subject: Solr and Autocompletion
: References: 49483388.8030...@drun.net
:  502b8706-828b-4eaa-886d-af0dccf37...@stylesight.com
:  8c0c601f0812170825j766cf005i9546b2604a19f...@mail.gmail.com
:  4949537a.3050...@drun.net
:  8599f2e4e80ecc44aee81fa2974ce2bd0c31d...@mail-sd1.ad.soe.sony.com
:  414cb3700812172015y2c0481c3hc6345392d514a...@mail.gmail.com
:  359a92830812180538q424a0744j3be8a109cec81...@mail.gmail.com
: In-Reply-To: 359a92830812180538q424a0744j3be8a109cec81...@mail.gmail.com

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email.  Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is hidden in that thread and gets less
attention.   It makes following discussions in the mailing list archives
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking



-Hoss

Re: [ANNOUNCE] Solr Logo Contest Results

2008-12-18 Thread Mathijs Homminga


Good choice!

Mathijs Homminga

Chris Hostetter wrote:

(replies to solr-user please)

On behalf of the Solr Committers, I'm happy to announce that we the 
Solr Logo Contest is officially concluded. (Woot!)


And the Winner Is...
https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg 


...by Michiel

We ran into a few hiccups during the contest making it take longer 
then intended, but the result was a thorough process in which everyone 
went above and beyond to ensure that the final choice best reflected 
the wishes of the community.


You can expect to see the new logo appear on the site (and in the Solr 
app) in the next few weeks.


Congrats Michiel!


-Hoss



--
Knowlogy
Helperpark 290 C
9723 ZA Groningen
+31 (0)50 2103567
http://www.knowlogy.nl

mathijs.hommi...@knowlogy.nl
+31 (0)6 15312977

Re: Get All terms from all documents

2008-12-18 Thread roberto

Erick,

Thanks for the answer, let me clarify the thing, we would like to have a
combobox with the terms to guide the user in the search i mean, if a have
thousands of documents and want to tell them how many documents in the base
have the particular word, how can i do that?

thanks

On Thu, Dec 18, 2008 at 11:25 AM, Erick Erickson erickerick...@gmail.comwrote:

 I think I'd pin the user down and have him give me the real-world
 use-cases that require this, then see if there's a more reasonable
  way to satisfy that use-case. Do they want type-ahead? What
 is the user of the system going to see? Because, for instance,
 a drop-down of 10,000 terms is totally useless.

 Best
 Erick

 On Wed, Dec 17, 2008 at 10:02 PM, roberto miles.c...@gmail.com wrote:

  Grant
 
  It completely crazy do something like this i know, but the customer
 want´s,
  i´m really trying to figure out how to do it in a better way, maybe using
  the (auto suggest) filter from solr 1.3 to get all the words starting
 with
  some letter and cache the letter in the client side, out client is going
 to
  be write in swing, what do you guys think?
 
  Thanks,
 
  On Wed, Dec 17, 2008 at 8:05 PM, Grant Ingersoll gsing...@apache.org
  wrote:
 
   All terms from all docs?  Really?
  
   At any rate, see http://wiki.apache.org/solr/TermsComponent  May need
 a
   mod to not require any field, but for now you can enter all fields
 (which
   you can get from LukeRequestHandler)
  
   -Grant
  
  
  
   On Dec 17, 2008, at 2:17 PM, roberto wrote:
  
   Hello,
  
   I need to get all terms from all documents to be placed in my
 interface
   almost like the facets, how can i do it?
  
   thanks
  
   --
   Without love, we are birds with broken wings.
   Morrie
  
  
   --
   Grant Ingersoll
  
   Lucene Helpful Hints:
   http://wiki.apache.org/lucene-java/BasicsOfPerformance
   http://wiki.apache.org/lucene-java/LuceneFAQ
  
  
  
  
  
  
  
  
  
  
  
 
 
  --
  Without love, we are birds with broken wings.
  Morrie
 




-- 
Without love, we are birds with broken wings.
Morrie

Approximate release date for 1.4

2008-12-18 Thread Kay Kay

Just curious - if we have an approximate target release date for 1.4 / 
list of milestones / feature sets for the same.

Re: [ANNOUNCE] Solr Logo Contest Results

2008-12-18 Thread Jeryl Cook

looks cool :),  how about a talking mascot as

Jeryl Cook
twoenc...@gmail.com

On Thu, Dec 18, 2008 at 1:38 PM, Mathijs Homminga
mathijs.hommi...@knowlogy.nl wrote:
 Good choice!

 Mathijs Homminga

 Chris Hostetter wrote:

 (replies to solr-user please)

 On behalf of the Solr Committers, I'm happy to announce that we the Solr
 Logo Contest is officially concluded. (Woot!)

 And the Winner Is...

 https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
 ...by Michiel

 We ran into a few hiccups during the contest making it take longer then
 intended, but the result was a thorough process in which everyone went above
 and beyond to ensure that the final choice best reflected the wishes of the
 community.

 You can expect to see the new logo appear on the site (and in the Solr
 app) in the next few weeks.

 Congrats Michiel!


 -Hoss


 --
 Knowlogy
 Helperpark 290 C
 9723 ZA Groningen
 +31 (0)50 2103567
 http://www.knowlogy.nl

 mathijs.hommi...@knowlogy.nl
 +31 (0)6 15312977






-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001

Re: Approximate release date for 1.4

2008-12-18 Thread Yonik Seeley

On Thu, Dec 18, 2008 at 2:43 PM, Kay Kay kaykay.uni...@gmail.com wrote:
 Just curious - if we have an approximate target release date for 1.4 / list
 of milestones / feature sets for the same.

Mid January.
Issues included: case-by-case analysis of how ready they are (and
obviously affected by committers scratching their own itch.)

-Yonik

Re: looking for multilanguage indexing best practice/hint

2008-12-18 Thread Julian Davchev

Thanks Erick,
I think I will go with different language fields as I want to give
different stop words, analyzers etc.
I might also consider scheme per language so scaling is more flexible as
I was already advised but this will really make sense if I have more
than one server I guess, else just all other data is duplicated for no
reason.
We already made decision that language will be passed each time in
search so won't make sense to search quert in any lang.

As of CJKAnalyzer from first look doesn't seem to be in solr (haven't
tried yet) and since I am noob in java will check how it's done.
Will definately give a try.

Thanks alot for help.

Erick Erickson wrote:
 See the CJKAnalyzer for a start, StandardAnalyzer won't
 help you much.

 Also, tell us a little more about your requirements. For instance,
 if a user submits a query in Japanese, do you want to search
 across documents in the other languages too? And will you want
 to associate different analyzers with the content from different
 languages? You really have two options:

 if you want different analyzers used with the different languages,
 you probably have to index the content in different fields. That is
 a Chinese document would have a chinese_content field, a Japanese
 document would have a japanese_content field etc. Now you can
 associate a different analyzer with each *_content field.

 If the same analyzer would work for all three languages, you
 can just index all the content in a content field, and if you
 need to restrict searching to the language in which the query
 was submitted, you could always add a clause on the
 language, e.g. AND language:chinese

 Hope this helps
 Erick

 On Wed, Dec 17, 2008 at 11:15 PM, Sujatha Arun suja.a...@gmail.com wrote:

   
 Hi,

 I am prototyping lanuage search using solr 1.3 .I  have 3 fields in the
 schema -id,content and language.

 I am indexing 3 pdf files ,the languages are foroyo,chinese and japanese.

 I use xpdf to convert the content of pdf to text and push the text to solr
 in the content field.

 What is the analyzer  that i need to use for the above.

 By using the default text analyzer and posting this content to solr, i am
 not getting any  results.

 Does solr support stemmin for the above languages.

 Regards
 Sujatha




 On 12/18/08, Feak, Todd todd.f...@smss.sony.com wrote:
 
 Don't forget to consider scaling concerns (if there are any). There are
 strong differences in the number of searches we receive for each
 language. We chose to create separate schema and config per language so
 that we can throw servers at a particular language (or set of languages)
 if we needed to. We see 2 orders of magnitude difference between our
 most popular language and our least popular.

 -Todd Feak

 -Original Message-
 From: Julian Davchev [mailto:j...@drun.net]
 Sent: Wednesday, December 17, 2008 11:31 AM
 To: solr-user@lucene.apache.org
 Subject: looking for multilanguage indexing best practice/hint

 Hi,
 From my study on solr and lucene so far it seems that I will use single
 scheme.at least don't see scenario where I'd need more than that.
 So question is how do I approach multilanguage indexing and multilang
 searching. Will it really make sense for just searching word..or rather
 I should supply lang param to search as well.

 I see there are those filters and already advised on them but I guess
 question is more of a best practice.
 solr.ISOLatin1AccentFilterFactory, solr.SnowballPorterFilterFactory

 So solution I see is using copyField I have same field in different
 langs or something using distinct filter.
 Cheers

does this break Solr? dynamicField name=* type=ignored

2008-12-18 Thread Peter Wolanin

I'm seeing a weird effect with a '*' field.  In the example
schema.xml, there is a commented out sample:

   !-- uncomment the following to ignore any fields that don't
already match an existing
field name or dynamic field, rather than reporting them as an error.
alternately, change the type=ignored to some other type e.g.
text if you want
unknown fields indexed and/or stored by default --
   !--dynamicField name=* type=ignored /--

We have this un-commented, and in the schema browser via the admin
interface I see that all non-dynamic fields get a type of ignored.

I see this in the Solr admin interface:

Field: uid
Dynamically Created From Pattern: *
Field Type: ignored

though the field definition is:

   field name=uid  type=integer indexed=true stored=true/

Is this a bug in the admin interface, or a problem with using this '*'
in the schema?

Thanks,

Peter

-- 
--
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Re: does this break Solr? dynamicField name=* type=ignored

2008-12-18 Thread Yonik Seeley

Looks like it's a bug in the schema browser (i.e. just this display,
no the inner workings of Solr).
Could you open a JIRA issue for this?

-Yonik


On Thu, Dec 18, 2008 at 3:20 PM, Peter Wolanin peter.wola...@acquia.com wrote:
 I'm seeing a weird effect with a '*' field.  In the example
 schema.xml, there is a commented out sample:

   !-- uncomment the following to ignore any fields that don't
 already match an existing
field name or dynamic field, rather than reporting them as an error.
alternately, change the type=ignored to some other type e.g.
 text if you want
unknown fields indexed and/or stored by default --
   !--dynamicField name=* type=ignored /--

 We have this un-commented, and in the schema browser via the admin
 interface I see that all non-dynamic fields get a type of ignored.

 I see this in the Solr admin interface:

 Field: uid
 Dynamically Created From Pattern: *
 Field Type: ignored

 though the field definition is:

   field name=uid  type=integer indexed=true stored=true/

 Is this a bug in the admin interface, or a problem with using this '*'
 in the schema?

 Thanks,

 Peter

 --
 --
 Peter M. Wolanin, Ph.D.
 Momentum Specialist,  Acquia. Inc.
 peter.wola...@acquia.com

Re: Partitioning the index

2008-12-18 Thread Yonik Seeley

It's more related to how much memory you have on your boxes, how
resource intensive your queries are, how many fields you are trying to
facet on, what acceptable response times are, etc.

Anyway... a single box is normally good for between 5M and 50M docs,
but can fall out of that range (both up and down) depending on the
specifics.

-Yonik

On Wed, Dec 17, 2008 at 9:34 PM, s d s.d.sau...@gmail.com wrote:
 Hi,Is there a recommended index size (on disk, number of documents) for when
 to start partitioning it to ensure good response time?
 Thanks,
 S

Re: Get All terms from all documents

2008-12-18 Thread Erick Erickson

How do you get the word in the first place? If the combobox
is for all words in your index, it's probably completely useless
to provide this information because there is too much data to
guide the user at all. I mean a list of 10,000 words with some sort
of document frequency seems to me to require significant
developer work without adding to the user experience at all...

If that's the case, I'd really work with your customer and try
to persuade them that this is a feature that adds little value,
and that there are higher-value features you should do first.

But if you really, really require the information, here's what I
would recommend:

Use TermDocs/TermEnum to traverse your index gathering
this data *at index time*. Then create a *very special* document
that you also put in your index (stored, but not indexed
in this case) that contains an unique field (say frequencies).

Upon startup of your searcher, read in this very special document,
parse it and create a map of words and frequencies that you use
to find the number of documents containing that word.

Hope this helps
Erick


On Thu, Dec 18, 2008 at 1:53 PM, roberto miles.c...@gmail.com wrote:

 Erick,

 Thanks for the answer, let me clarify the thing, we would like to have a
 combobox with the terms to guide the user in the search i mean, if a have
 thousands of documents and want to tell them how many documents in the base
 have the particular word, how can i do that?

 thanks

 On Thu, Dec 18, 2008 at 11:25 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  I think I'd pin the user down and have him give me the real-world
  use-cases that require this, then see if there's a more reasonable
   way to satisfy that use-case. Do they want type-ahead? What
  is the user of the system going to see? Because, for instance,
  a drop-down of 10,000 terms is totally useless.
 
  Best
  Erick
 
  On Wed, Dec 17, 2008 at 10:02 PM, roberto miles.c...@gmail.com wrote:
 
   Grant
  
   It completely crazy do something like this i know, but the customer
  want´s,
   i´m really trying to figure out how to do it in a better way, maybe
 using
   the (auto suggest) filter from solr 1.3 to get all the words starting
  with
   some letter and cache the letter in the client side, out client is
 going
  to
   be write in swing, what do you guys think?
  
   Thanks,
  
   On Wed, Dec 17, 2008 at 8:05 PM, Grant Ingersoll gsing...@apache.org
   wrote:
  
All terms from all docs?  Really?
   
At any rate, see http://wiki.apache.org/solr/TermsComponent  May
 need
  a
mod to not require any field, but for now you can enter all fields
  (which
you can get from LukeRequestHandler)
   
-Grant
   
   
   
On Dec 17, 2008, at 2:17 PM, roberto wrote:
   
Hello,
   
I need to get all terms from all documents to be placed in my
  interface
almost like the facets, how can i do it?
   
thanks
   
--
Without love, we are birds with broken wings.
Morrie
   
   
--
Grant Ingersoll
   
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
   
   
   
   
   
   
   
   
   
   
   
  
  
   --
   Without love, we are birds with broken wings.
   Morrie
  
 



 --
 Without love, we are birds with broken wings.
 Morrie

Data Import Request Handler problem: Odd performance behaviour for large number of records

2008-12-18 Thread Glen Newton

Hello,

I amusing Solr 1.4 (solr-2008-11-19) with Lucene 2.4 dropped in instead of 2.9

I am indexing 500k records using the JDBC Data Import Request Handler.

Config:
 Linux openSUSE 10.2 (X86-64)
 Dual core dual core 64bit Xeon 3GHz Dell blade  8GB RAM
 java version 1.6.0_07
 Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
 Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)
 1GB heap for Tomcat
 DB: MySql on separate but similar server

I am finding that the when I do a Full-Import, followed by another
Full-import the import takes much longer the second and subsequent
times:
Run1 = 0:27:31.491
Run2 = 1:14:44:821
Run3 = 1:14:48.316
Run4 = 2:15:12.296
Run5 = 1:37:6.847

(I have run this ~10 times and got roughly the same results). I have
also monitored the load on the Solr machine and the databases machine
for any other activity that might impact.

The final Lucene index size is 923MB. The default clean = 'true', so
the index is cleared (emptied) each time, so I am concerned the second
run takes 4 times the time of the first run.

Am I doing something wrong here? Any help would be appreciated.

I have append my data-config.xml

thanks,

Glen

dataConfig
dataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://blue01/dartejos user=USER password=PASSWD/
document name=products
entity name=item query=select  Publisher.name as pub,
Journal.title as jo, Article.rawUrl as textpath, Journal.issn,
Volume.number as vol,Volume.coverYear as year, Issue.number as iss,
Article.id,Article.title as ti, Article.abstract, Article.startPage as
startPage,Article.endPage as endPage from Publisher, Journal, Volume,
Issue, Article where Publisher.id = Journal.publisherId and Journal.id
= Volume.journalId and Volume.id = Issue.volumeId and Issue.id =
Article.issueId  limit 50
field column=id name=id /
field column=jo name=id /
field column=issn name=id /
field column=vol name=id /
field column=year name=id /
field column=iss name=id /
field name=abstract column=abstract/
field name=title column=title/
field name=pub column=pub/
field name=textpath column=textpath/
field name=startPage column=startPage/
field name=endPage column=endPage/
/entity
/document
/dataConfig

-- 

-

Re: does this break Solr? dynamicField name=* type=ignored

2008-12-18 Thread Peter Wolanin

created issue:  https://issues.apache.org/jira/browse/SOLR-929

-Peter

On Thu, Dec 18, 2008 at 3:32 PM, Yonik Seeley ysee...@gmail.com wrote:
 Looks like it's a bug in the schema browser (i.e. just this display,
 no the inner workings of Solr).
 Could you open a JIRA issue for this?

 -Yonik


 On Thu, Dec 18, 2008 at 3:20 PM, Peter Wolanin peter.wola...@acquia.com 
 wrote:
 I'm seeing a weird effect with a '*' field.  In the example
 schema.xml, there is a commented out sample:

   !-- uncomment the following to ignore any fields that don't
 already match an existing
field name or dynamic field, rather than reporting them as an error.
alternately, change the type=ignored to some other type e.g.
 text if you want
unknown fields indexed and/or stored by default --
   !--dynamicField name=* type=ignored /--

 We have this un-commented, and in the schema browser via the admin
 interface I see that all non-dynamic fields get a type of ignored.

 I see this in the Solr admin interface:

 Field: uid
 Dynamically Created From Pattern: *
 Field Type: ignored

 though the field definition is:

   field name=uid  type=integer indexed=true stored=true/

 Is this a bug in the admin interface, or a problem with using this '*'
 in the schema?

 Thanks,

 Peter

 --
 --
 Peter M. Wolanin, Ph.D.
 Momentum Specialist,  Acquia. Inc.
 peter.wola...@acquia.com





-- 
--
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Full reindex needed if termVectors added to fields in schema?

2008-12-18 Thread Eric Kilby


hi,

I've successfully added fields to my schema.xml before, and been able to
incrementally keep indexing documents with just the new ones picking up the
fields.  This appears to be similar to the case of not including certain
fields in certain documents, as the other documents simply don't have them
until they're added.

I'm looking into testing a MoreLikeThis implementation, and have read on
here that termVectors are needed to make it run acceptably.  I'd like to
rebuild my index, but that will take some time given the number of documents
involved, and I'd like to keep incremental updates running at the same time. 
The constraint is on the database side not the SOLR indexing side, so
improvements to indexing performance aren't my main concern here.  

So, my question is whether adding termVectors=true to a couple of schema
fields will work similarly to adding new fields, where the updated documents
will get the vectors added and the others won't get them but will continue
to work, allowing me to rebuild in the background while not breaking
anything in my existing incremental update/release cycle.

I appreciate your help.

Eric Kilby

-- 
View this message in context: 
http://www.nabble.com/Full-reindex-needed-if-termVectors-added-to-fields-in-schema--tp21081315p21081315.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Change in config file (synonym.txt) requires container restart?

2008-12-18 Thread Sagar Khetkade


But i am using CommonsHttpSolrServer for Solr server configuation as it is 
accepts the url. So here how can i reload the core.
 
-Sagar Date: Thu, 18 Dec 2008 07:55:02 -0500 From: markrmil...@gmail.com To: 
solr-user@lucene.apache.org Subject: Re: Change in config file (synonym.txt) 
requires container restart?  Sagar Khetkade wrote:  Hi,I am using 
SolrJ client to connect to the Solr 1.3 server and the whole POC (doing a 
feasibility study ) reside in Tomcat web server. If any change I am making in 
the synonym.txt file to add the synonym in the file to make it reflect I have 
to restart the tomcat server. The synonym filter factory that I am using are in 
both in analyzers for type index and query in schema.xml. Please tell me 
whether this approach is good or any other way to make the change reflect while 
searching without restarting of tomcat server.Thanks and Regards,  
Sagar Khetkade  
_  Chose your 
Life Partner? Join MSN Matrimony FREE  http://in.msn.com/matrimony   You 
can also reload the core.  - Mark
_
Chose your Life Partner? Join MSN Matrimony FREE
http://in.msn.com/matrimony

Re: TermVectorComponent and SolrJ

2008-12-18 Thread Grant Ingersoll



On Dec 18, 2008, at 10:06 AM, Aleksander M. Stensby wrote:

Hello everyone, I've started to look at TermVectorComponent and I'm  
experimenting with the use of the component in a sort of top terms  
setting for a given query...
Was also looking at mlt and the interestingTerms, but I would like  
to do a query, get say 10k results, and from those results return a  
list of top 10 terms or something similar...


Haven't really thought too much about it yet, but I was wondering if  
anyone have done any work on making the term vector response  
available in a simple manner with solrj yet? Or if this is planned?  
(In the same sense as it is today with facets  
(response.getFacetFields() etc..). Not that I cant manage to write  
it myself, but I would recon that more people than me would be  
interessted in this. I'd be more than happy to contribute if it is  
wanted, just wanted to check if anyone have started on this already  
or not.




I think this would be a welcome contribution.

-Grant

Re: Multi language search help

2008-12-18 Thread Grant Ingersoll



On Dec 18, 2008, at 6:25 AM, Sujatha Arun wrote:


Hi,
I am prototyping lanuage search using solr 1.3 .I  have 3 fields in  
the

schema -id,content and language.

I am indexing 3 pdf files ,the languages are foroyo,chinese and  
japanese.


I use xpdf to convert the content of pdf to text and push the text  
to solr

in the content field.

What is the analyzer  that i need to use for the above.

By using the default text analyzer and posting this content to solr,  
i am

not getting any  results.

Does solr support stemming for the above languages.


I'm not familiar with Foroyo, but there should be tokenizers/analysis  
available for Chines and Japanese.  Are you putting all three  
languages into the same field?  If that is the case, you will need  
some type of language detection piece that can choose the correct  
analyzer.


How are your users searching?  That is, do you know the language they  
want to search in?  If so, then you can have a field for each language.


-Grant

Re: Data Import Request Handler problem: Odd performance behaviour for large number of records

2008-12-18 Thread Noble Paul നോബിള്‍ नोब्ळ्

DIH does not maintain any state between two runs. So if there is a
perf degradation
it could be because
- Solr Indexing is taking longer after you do a delete *:*
- Your RAM is insufficient (your machine is swapping)

On Fri, Dec 19, 2008 at 2:51 AM, Glen Newton glen.new...@gmail.com wrote:
 Hello,

 I amusing Solr 1.4 (solr-2008-11-19) with Lucene 2.4 dropped in instead of 2.9

 I am indexing 500k records using the JDBC Data Import Request Handler.

 Config:
  Linux openSUSE 10.2 (X86-64)
  Dual core dual core 64bit Xeon 3GHz Dell blade  8GB RAM
  java version 1.6.0_07
  Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
  Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)
  1GB heap for Tomcat
  DB: MySql on separate but similar server

 I am finding that the when I do a Full-Import, followed by another
 Full-import the import takes much longer the second and subsequent
 times:
 Run1 = 0:27:31.491
 Run2 = 1:14:44:821
 Run3 = 1:14:48.316
 Run4 = 2:15:12.296
 Run5 = 1:37:6.847

 (I have run this ~10 times and got roughly the same results). I have
 also monitored the load on the Solr machine and the databases machine
 for any other activity that might impact.

 The final Lucene index size is 923MB. The default clean = 'true', so
 the index is cleared (emptied) each time, so I am concerned the second
 run takes 4 times the time of the first run.

 Am I doing something wrong here? Any help would be appreciated.

 I have append my data-config.xml

 thanks,

 Glen

 dataConfig
 dataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://blue01/dartejos user=USER password=PASSWD/
document name=products
entity name=item query=select  Publisher.name as pub,
 Journal.title as jo, Article.rawUrl as textpath, Journal.issn,
 Volume.number as vol,Volume.coverYear as year, Issue.number as iss,
 Article.id,Article.title as ti, Article.abstract, Article.startPage as
 startPage,Article.endPage as endPage from Publisher, Journal, Volume,
 Issue, Article where Publisher.id = Journal.publisherId and Journal.id
 = Volume.journalId and Volume.id = Issue.volumeId and Issue.id =
 Article.issueId  limit 50
field column=id name=id /
field column=jo name=id /
field column=issn name=id /
field column=vol name=id /
field column=year name=id /
field column=iss name=id /
field name=abstract column=abstract/
field name=title column=title/
field name=pub column=pub/
field name=textpath column=textpath/
field name=startPage column=startPage/
field name=endPage column=endPage/
/entity
/document
 /dataConfig

 --

 -




-- 
--Noble Paul

Re: Change in config file (synonym.txt) requires container restart?

2008-12-18 Thread Shalin Shekhar Mangar

Please note that a core reload will also stop Solr from serving any search
requests in the time it reloads.

On Fri, Dec 19, 2008 at 8:24 AM, Sagar Khetkade
sagar.khetk...@hotmail.comwrote:


 But i am using CommonsHttpSolrServer for Solr server configuation as it is
 accepts the url. So here how can i reload the core.

 -Sagar Date: Thu, 18 Dec 2008 07:55:02 -0500 From: markrmil...@gmail.com
 To: solr-user@lucene.apache.org Subject: Re: Change in config file
 (synonym.txt) requires container restart?  Sagar Khetkade wrote:  Hi, 
   I am using SolrJ client to connect to the Solr 1.3 server and the whole
 POC (doing a feasibility study ) reside in Tomcat web server. If any change
 I am making in the synonym.txt file to add the synonym in the file to make
 it reflect I have to restart the tomcat server. The synonym filter factory
 that I am using are in both in analyzers for type index and query in
 schema.xml. Please tell me whether this approach is good or any other way to
 make the change reflect while searching without restarting of tomcat
 server.Thanks and Regards,  Sagar Khetkade 
 _  Chose
 your Life Partner? Join MSN Matrimony FREE  http://in.msn.com/matrimony
   You can also reload the core.  - Mark
 _
 Chose your Life Partner? Join MSN Matrimony FREE
 http://in.msn.com/matrimony




-- 
Regards,
Shalin Shekhar Mangar.

Re: Precisions on solr.xml about cross context forwarding.

2008-12-18 Thread Chris Hostetter


: This bothers me too.  I find it really strange that Solr's entry-point 
: is a servlet filter instead of a servlet.

it traces back to the need for it to decide when to handle a request and 
when to let it pass through (to a later filter, a servlet or a JSP)

this is the only way legacy support for the /select and /update urls work 
without forcing people to modify the web.xml; it's how a handler can be 
registered with the name /admin/foo even though /admin/ resolves to a JSP 
(and without forcing people to modify the web.xml); and it's what allows 
us to use the same core path prefixes for both handler requests and the 
Admin JSPs.

:  It is unnecessary, and potentially problematic, to have the 
SolrDispatchFilter
:   configured to also filter on forwards.  Do not configure
:   this dispatcher as dispatcherFORWARD/dispatcher.
: 
: The problem is that if filters do not have this FORWARD thing, then
: cross context forwarding doesn't work.
: 
: Is there a workaround to this problem ?

You can try adding the FORWARD option, but the risk is that 
SolrRequestFilter could wind up forwarding to itself infinitely on some 
requests (depending on your configuration)...

http://www.nabble.com/Re%3A-svn-commit%3A-r640449lucene-solr-trunk-src-webapp-src-org-apache-solr-servlet-SolrDispatchFilter.java-p16262766.html



-Hoss

Fwd: Distributed Searching - Limitations?

2008-12-18 Thread Pooja Verlani

Hi,
I am planning to use Solr's distributed searching for my project. But while
going through http://wiki.apache.org/solr/DistributedSearch, i found a few
limitations with it. Can anyone please explain the 2nd and 3rd points in the
limitations sections on the page. The points are:

   -

   When duplicate doc IDs are received, Solr chooses the first doc and
   discards subsequent ones
   -

   No distributed idf

Thanks.
Regards,
Pooja

42 matches

Mail list logo