Re: Setting termInfosIndexDivisor and Interval?

2009-07-20 Thread Shalin Shekhar Mangar
On Mon, Jul 20, 2009 at 8:04 AM, Jason Rutherglen 
jason.rutherg...@gmail.com wrote:

 Are we currently supporting this or in 1.4? (i.e.
 IndexReader.open and IndexWriter.setTermIndexInterval) It's
 useful for trie range, shingles, etc, where many terms are
 potentially created.


No, we don't currently but we should. Lets open an issue.

-- 
Regards,
Shalin Shekhar Mangar.


Help needed with Solr maxBooleanClauses

2009-07-20 Thread dipanjan_pramanick
Hi,
We have scenario where we need to send more than 1024 ids in the Solr url as 
OR condition.
I have changed the value of maxBooleanClauses in solrconfig.xml, to 2048, but 
it is failing after handling 1024 OR conditions.
Solr is throwing SEVERE: org.apache.solr.common.SolrException: Bad Request 
whenever I am sending more than 1024 OR conditions. Is there any way I can 
change this value on Solr configuration.


Thanks,
Dipanjan


How to configure Solr in Glassfish ?

2009-07-20 Thread huenzhao

I want use glassfish as the solr search server, but I don't know how to
configure.
Anybody knows? 

enzhao...@gmail.com 
Thanks!

-- 
View this message in context: 
http://www.nabble.com/How-to-configure-Solr--in-Glassfish---tp24565758p24565758.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Help needed with Solr maxBooleanClauses

2009-07-20 Thread Shalin Shekhar Mangar
On Mon, Jul 20, 2009 at 1:37 PM, dipanjan_pramanick 
dipanjan_praman...@infosys.com wrote:

 Hi,
 We have scenario where we need to send more than 1024 ids in the Solr url
 as OR condition.
 I have changed the value of maxBooleanClauses in solrconfig.xml, to 2048,
 but it is failing after handling 1024 OR conditions.
 Solr is throwing SEVERE: org.apache.solr.common.SolrException: Bad
 Request whenever I am sending more than 1024 OR conditions. Is there any
 way I can change this value on Solr configuration.


The maxBooleanClauses is there as a safe guard against extremely slow
queries. If you can tell us about the exact problem you are solving, we may
be able to suggest an alternative approach? Creating such huge boolean
clauses may be a bad design choice.

As for the exception you are seeing, it seems to me that you may be
exceeding the size of a GET request. Using an HTTP POST request may work.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Help needed with Solr maxBooleanClauses

2009-07-20 Thread dipanjan_pramanick
Hi Shalin,
Thanks for your time to respond to this issue.

Its true that there is a design flaw, because of what we need to support a huge 
list of OR conditions through Solr.
But still I would like to know if there is any other configuration other than 
the one in solrConfig.xml, through which we can pass more than 1024 OR 
conditions.

maxBooleanClauses1024maxBooleanClauses


Regarding HTTP Post, in Solr 1.3, it is only accepts request in url form not as 
request parameter or request object. That's another issue. Hence we need to 
send the query in url form only.


Thanks,
Dipanjan



From: Shalin Shekhar Mangar shalinman...@gmail.com
Reply-To: solr-user@lucene.apache.org
Date: Mon, 20 Jul 2009 13:58:55 +0530
To: solr-user@lucene.apache.org
Subject: Re: Help needed with Solr maxBooleanClauses

On Mon, Jul 20, 2009 at 1:37 PM, dipanjan_pramanick 
dipanjan_praman...@infosys.com wrote:

 Hi,
 We have scenario where we need to send more than 1024 ids in the Solr url
 as OR condition.
 I have changed the value of maxBooleanClauses in solrconfig.xml, to 2048,
 but it is failing after handling 1024 OR conditions.
 Solr is throwing SEVERE: org.apache.solr.common.SolrException: Bad
 Request whenever I am sending more than 1024 OR conditions. Is there any
 way I can change this value on Solr configuration.


The maxBooleanClauses is there as a safe guard against extremely slow
queries. If you can tell us about the exact problem you are solving, we may
be able to suggest an alternative approach? Creating such huge boolean
clauses may be a bad design choice.

As for the exception you are seeing, it seems to me that you may be
exceeding the size of a GET request. Using an HTTP POST request may work.

--
Regards,
Shalin Shekhar Mangar.


 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


Re: Help needed with Solr maxBooleanClauses

2009-07-20 Thread Shalin Shekhar Mangar
On Mon, Jul 20, 2009 at 2:12 PM, dipanjan_pramanick 
dipanjan_praman...@infosys.com wrote:


 Its true that there is a design flaw, because of what we need to support a
 huge list of OR conditions through Solr.
 But still I would like to know if there is any other configuration other
 than the one in solrConfig.xml, through which we can pass more than 1024 OR
 conditions.

 maxBooleanClauses1024maxBooleanClauses


 Regarding HTTP Post, in Solr 1.3, it is only accepts request in url form
 not as request parameter or request object. That's another issue. Hence we
 need to send the query in url form only.


Changing the value of maxBooleanClauses in solrconfig.xml is sufficient. The
problem here is that you may be exceeding the maximum allowed size of an
HTTP GET request (is that 2KB?). You must use POST request to send such a
huge query string. Again, it will help if you can post the complete stack
trace of the error.

-- 
Regards,
Shalin Shekhar Mangar.


Confusion around Binary/XML in SolrJ

2009-07-20 Thread Code Tester
I am using solr 1.4 dev in a multicore way.  Each of my core's
solrconfig.xml has the following lines

requestHandler name=/update class=solr.XmlUpdateRequestHandler /
requestHandler name=/update/javabin
class=solr.BinaryUpdateRequestHandler /

I am using SolrJ as EmbeddedSolrServer. When I try to add a POJO ( with
@Field annotations ), the data does not get indexed. Where as, if I use
SolrInputDocument way, the data gets indexed.

PS: Both ways I am adding data using addBean/add and then commit followed by
optimize

PPS: The final intention is that all the indexing and searching needs to be
done in the binary format since I am running on a single machine.

Could someone provide insights on this issue ?

Thanks!


Re: Confusion around Binary/XML in SolrJ

2009-07-20 Thread Code Tester
Another observation:

I am even unable to delete documents using the EmbeddedSolrServer ( on a
specific core )

Steps:

1) I have 2 cores ( core0 , core1 ) Each of them have ~10 records.

2) System.setProperty(solr.solr.home,
/home/user/projects/solr/example/multi);
File home = new File(/home/user/projects/solr/example/multi);
File f = new File(home, solr.xml);
CoreContainer coreContainer = new CoreContainer();
coreContainer.load(/home/user/projects/solr/example/multi, f);
SolrServer server = new EmbeddedSolrServer(coreContainer, core1);

server.deleteByQuery(*:*);
server.commit();
server.optimize();

3) When I check the status using
http://localhost:8983/solr/admin/cores?action=STATUS , I still see same
number of numDocs.

4) If I try deleting using CommonsHttpSolrServer, it works fine
String url = http://localhost:8983/solr/core1;;
CommonsHttpSolrServer server = new CommonsHttpSolrServer(url);
server.setSoTimeout(1000); // socket read timeout
server.setConnectionTimeout(100);
server.setDefaultMaxConnectionsPerHost(100);
server.setMaxTotalConnections(100);
server.setFollowRedirects(false); // defaults to false
server.setAllowCompression(true);
server.setMaxRetries(1); // defaults to 0.  1 not recommended.
server.setRequestWriter(new BinaryRequestWriter());

server.deleteByQuery(*:*);
server.commit();
server.optimize();

Thanks!

On Mon, Jul 20, 2009 at 3:26 PM, Code Tester 
codetester.codetes...@gmail.com wrote:

 I am using solr 1.4 dev in a multicore way.  Each of my core's
 solrconfig.xml has the following lines

 requestHandler name=/update class=solr.XmlUpdateRequestHandler /
 requestHandler name=/update/javabin
 class=solr.BinaryUpdateRequestHandler /

 I am using SolrJ as EmbeddedSolrServer. When I try to add a POJO ( with
 @Field annotations ), the data does not get indexed. Where as, if I use
 SolrInputDocument way, the data gets indexed.

 PS: Both ways I am adding data using addBean/add and then commit followed
 by optimize

 PPS: The final intention is that all the indexing and searching needs to be
 done in the binary format since I am running on a single machine.

 Could someone provide insights on this issue ?

 Thanks!






Re: Help needed with Solr maxBooleanClauses

2009-07-20 Thread dipanjan_pramanick
Hi Shalin,

We just found that there is no limit on Solr side about the maximum boolean 
condition. We have set the maxBooleanClauses2048/maxBooleanClauses and we 
are able to send about 1574 OR conditions.
Over that limit, we are getting HTTP/1.1 400 Bad Request.

You are correct, it's not a Solr issue, its due to HTTP GET is not being able 
to send such a large request.

But now the question is, Solr only accepts request in url form not as request 
parameter or request object. That's the main issue. Hence we need to send the 
query in url form only.
Can you please suggest, if you have tried the similar thing instead of passing 
as URL, instead passing as object or request entity using POST.



Thanks,
Dipanjan



From: Shalin Shekhar Mangar shalinman...@gmail.com
Reply-To: solr-user@lucene.apache.org
Date: Mon, 20 Jul 2009 14:54:04 +0530
To: solr-user@lucene.apache.org
Subject: Re: Help needed with Solr maxBooleanClauses

On Mon, Jul 20, 2009 at 2:12 PM, dipanjan_pramanick 
dipanjan_praman...@infosys.com wrote:


 Its true that there is a design flaw, because of what we need to support a
 huge list of OR conditions through Solr.
 But still I would like to know if there is any other configuration other
 than the one in solrConfig.xml, through which we can pass more than 1024 OR
 conditions.

 maxBooleanClauses1024maxBooleanClauses


 Regarding HTTP Post, in Solr 1.3, it is only accepts request in url form
 not as request parameter or request object. That's another issue. Hence we
 need to send the query in url form only.


Changing the value of maxBooleanClauses in solrconfig.xml is sufficient. The
problem here is that you may be exceeding the maximum allowed size of an
HTTP GET request (is that 2KB?). You must use POST request to send such a
huge query string. Again, it will help if you can post the complete stack
trace of the error.

--
Regards,
Shalin Shekhar Mangar.


 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


post error - ERROR:unknown field 'title'

2009-07-20 Thread rossputin

Hi guys.

I have two different solr versions as I am evaluating nightly builds.  On a
more recent one.. I think 15th July I am getting the following error :

ERROR:unknown field 'title'

I am posting to 'solr/update/extract' with the following:

curl
http://localhost:8983/solr/update/extract?ext.literal.id=1ext.literal.code=somecodeext.literal.url=someurl/file.pdfext.literal.category=somecatext.literal.updated=2009-06-01T09:10:30.000Zext.idx.attr=true\ext.def.fl=text;
-F myfi...@1411_9.pdf

My schema does not, and is not intended to contain a 'title' field.

Thanks in advance for your help,

 -- Ross
-- 
View this message in context: 
http://www.nabble.com/post-error---ERROR%3Aunknown-field-%27title%27-tp24567235p24567235.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Confusion around Binary/XML in SolrJ

2009-07-20 Thread Code Tester
Sorry everyone. Found the issue. It was because of a very stupid assumption.

My code and solr were running as 2 different processes! ( Weird part is that
when I run the code using EmbeddedSolrServer, it did not throw any exception
that there was already a server running on that port. )

Thanks!

On Mon, Jul 20, 2009 at 3:41 PM, Code Tester 
codetester.codetes...@gmail.com wrote:

 Another observation:

 I am even unable to delete documents using the EmbeddedSolrServer ( on a
 specific core )

 Steps:

 1) I have 2 cores ( core0 , core1 ) Each of them have ~10 records.

 2) System.setProperty(solr.solr.home,
 /home/user/projects/solr/example/multi);
 File home = new File(/home/user/projects/solr/example/multi);
 File f = new File(home, solr.xml);
 CoreContainer coreContainer = new CoreContainer();
 coreContainer.load(/home/user/projects/solr/example/multi, f);
 SolrServer server = new EmbeddedSolrServer(coreContainer, core1);

 server.deleteByQuery(*:*);
 server.commit();
 server.optimize();

 3) When I check the status using
 http://localhost:8983/solr/admin/cores?action=STATUS , I still see same
 number of numDocs.

 4) If I try deleting using CommonsHttpSolrServer, it works fine
 String url = http://localhost:8983/solr/core1;;
 CommonsHttpSolrServer server = new CommonsHttpSolrServer(url);
 server.setSoTimeout(1000); // socket read timeout
 server.setConnectionTimeout(100);
 server.setDefaultMaxConnectionsPerHost(100);
 server.setMaxTotalConnections(100);
 server.setFollowRedirects(false); // defaults to false
 server.setAllowCompression(true);
 server.setMaxRetries(1); // defaults to 0.  1 not recommended.
 server.setRequestWriter(new BinaryRequestWriter());

 server.deleteByQuery(*:*);
 server.commit();
 server.optimize();

 Thanks!


 On Mon, Jul 20, 2009 at 3:26 PM, Code Tester 
 codetester.codetes...@gmail.com wrote:

 I am using solr 1.4 dev in a multicore way.  Each of my core's
 solrconfig.xml has the following lines

 requestHandler name=/update class=solr.XmlUpdateRequestHandler /
 requestHandler name=/update/javabin
 class=solr.BinaryUpdateRequestHandler /

 I am using SolrJ as EmbeddedSolrServer. When I try to add a POJO ( with
 @Field annotations ), the data does not get indexed. Where as, if I use
 SolrInputDocument way, the data gets indexed.

 PS: Both ways I am adding data using addBean/add and then commit followed
 by optimize

 PPS: The final intention is that all the indexing and searching needs to
 be done in the binary format since I am running on a single machine.

 Could someone provide insights on this issue ?

 Thanks!








Solr and UIMA

2009-07-20 Thread JCodina

We are starting to use UIMA as a platform to analyze the text.
The result of analyzing a document is a UIMA CAS. A Cas is a generic  data
structure that can contain different data. 
UIMA processes single documents, They get the documents from a CAS producer,
process them using a PIPE that the user defines  and finally sends the
result to a CAS consumer, that saves or stores the result.
The pipe is then a connection of different tools that annotate the text with
different information. Different sets of tools are available out there, each
of them deffining it's own data type's  that are included in the CAS. To
perform a PIPE output and input CAS of the elements to connect need to be
compatible

There is CAS consumer that feeds a LUCENE index, it's called LUCAS but I was
looking to it, and I prefer to use UIMA connected to SOLR, why?
A: I know solr ;-) and i like it 
B: I can configure  the fields  and their processing in solr using xml. Once
done then I have it ready to use with a set of tools that allow me to easily
explore the data  
C: Is easier to use SOLR as a web service that may receive docs from
different UIMA's (Natural Language processing is CPU intensive )
D: Break things down. The CAS would only produce XML that solr can process.
Then different Tokenizers can be used to deal with the data in the CAS. the
main point is that the XML has a the doc and field labels of solr.
E: The set of capabilities to process the xml is defined in XML, similar to
lucas to define the ouput and in the solr schema to define how this is
processed.


I want to use it in order to index something that is common but I can't get
any tool to do that with sol: indexing a word and coding at the same
position the syntactic and semantic information. I know that in Lucene this
is evolving and it will be possible to include metadata but for the moment


So, my idea is first to produce a UIMA CAS consumer that performs the POST
of an XML file containing the plain text text of the document  to SOLR; then
try to modify this in order to include multiple fields and start coding the
semantic information.

So, before starting, i would like to know your opinions and if anyone is
interested to collaborate, or has some code that can be integrated into
this.
 
-- 
View this message in context: 
http://www.nabble.com/Solr-and-UIMA-tp24567504p24567504.html
Sent from the Solr - User mailing list archive at Nabble.com.



Wildcards at the Beginning of a Search.

2009-07-20 Thread Jörg Agatz
Hallo Solr Users...

I tryed to search with a Wildcard at the beginning from a search.

for example, i will search for *est and get test, vogelnest, fest, 
But it dosent work, i alsways get an error...

Now my Big brother GOOGLE tolds me, that it can work but a search with a
Wildcad at the beginning need a long time...

Now i will test ist. but How?


Re: Help needed with Solr maxBooleanClauses

2009-07-20 Thread Avlesh Singh
If yours is a JAVA stack of application, I would recommend moving to SolrJ.
It is a client API which lets you talk to Solr. Know more about it here -
http://wiki.apache.org/solr/Solrj
Clients API's for other languages can be found here -
http://wiki.apache.org/solr/#head-ab1768efa59b26cbd30f1acd03b633f1d110ed47

Cheers
Avlesh

On Mon, Jul 20, 2009 at 3:44 PM, dipanjan_pramanick 
dipanjan_praman...@infosys.com wrote:

 Hi Shalin,

 We just found that there is no limit on Solr side about the maximum boolean
 condition. We have set the maxBooleanClauses2048/maxBooleanClauses and
 we are able to send about 1574 OR conditions.
 Over that limit, we are getting HTTP/1.1 400 Bad Request.

 You are correct, it's not a Solr issue, its due to HTTP GET is not being
 able to send such a large request.

 But now the question is, Solr only accepts request in url form not as
 request parameter or request object. That's the main issue. Hence we need to
 send the query in url form only.
 Can you please suggest, if you have tried the similar thing instead of
 passing as URL, instead passing as object or request entity using POST.



 Thanks,
 Dipanjan


 
 From: Shalin Shekhar Mangar shalinman...@gmail.com
 Reply-To: solr-user@lucene.apache.org
 Date: Mon, 20 Jul 2009 14:54:04 +0530
 To: solr-user@lucene.apache.org
 Subject: Re: Help needed with Solr maxBooleanClauses

 On Mon, Jul 20, 2009 at 2:12 PM, dipanjan_pramanick 
 dipanjan_praman...@infosys.com wrote:

 
  Its true that there is a design flaw, because of what we need to support
 a
  huge list of OR conditions through Solr.
  But still I would like to know if there is any other configuration other
  than the one in solrConfig.xml, through which we can pass more than 1024
 OR
  conditions.
 
  maxBooleanClauses1024maxBooleanClauses
 
 
  Regarding HTTP Post, in Solr 1.3, it is only accepts request in url form
  not as request parameter or request object. That's another issue. Hence
 we
  need to send the query in url form only.
 
 
 Changing the value of maxBooleanClauses in solrconfig.xml is sufficient.
 The
 problem here is that you may be exceeding the maximum allowed size of an
 HTTP GET request (is that 2KB?). You must use POST request to send such a
 huge query string. Again, it will help if you can post the complete stack
 trace of the error.

 --
 Regards,
 Shalin Shekhar Mangar.


  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
 solely
 for the use of the addressee(s). If you are not the intended recipient,
 please
 notify the sender by e-mail and delete the original message. Further, you
 are not
 to copy, disclose, or distribute this e-mail or its contents to any other
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has
 taken
 every reasonable precaution to minimize this risk, but is not liable for
 any damage
 you may sustain as a result of any virus in this e-mail. You should carry
 out your
 own virus checks before opening the e-mail or attachment. Infosys reserves
 the
 right to monitor and review the content of all messages sent to or from
 this e-mail
 address. Messages sent to or from this e-mail address may be stored on the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***



Re: Confusion around Binary/XML in SolrJ

2009-07-20 Thread Erik Hatcher


On Jul 20, 2009, at 6:11 AM, Code Tester wrote:
I am even unable to delete documents using the EmbeddedSolrServer  
( on a

specific core )

Steps:

1) I have 2 cores ( core0 , core1 ) Each of them have ~10 records.

2) System.setProperty(solr.solr.home,
/home/user/projects/solr/example/multi);
   File home = new File(/home/user/projects/solr/example/multi);
   File f = new File(home, solr.xml);
   CoreContainer coreContainer = new CoreContainer();
   coreContainer.load(/home/user/projects/solr/example/multi,  
f);
   SolrServer server = new EmbeddedSolrServer(coreContainer,  
core1);


   server.deleteByQuery(*:*);
   server.commit();
   server.optimize();

3) When I check the status using
http://localhost:8983/solr/admin/cores?action=STATUS , I still see  
same


Assuming both of these point to the same Solr home and data directory,  
a commit is needed on the HTTP process in order for it to pick up  
changes made to the underlying index (that occurred without its  
knowledge).


Erik



Re: Wildcards at the Beginning of a Search.

2009-07-20 Thread Erik Hatcher
See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently  
does not have leading wildcard support enabled.


Erik

On Jul 20, 2009, at 8:09 AM, Jörg Agatz wrote:


Hallo Solr Users...

I tryed to search with a Wildcard at the beginning from a search.

for example, i will search for *est and get test, vogelnest,  
fest, 

But it dosent work, i alsways get an error...

Now my Big brother GOOGLE tolds me, that it can work but a search  
with a

Wildcad at the beginning need a long time...

Now i will test ist. but How?




Posting multiple documents at once - clarification

2009-07-20 Thread Vannia Rajan
Hi,

  When we post a file with a number of documents of the format shown below
to solr, if there is some 'error' in one of the doc, then all the docs
in the file are error-ed out and not added to the Solr-index.

?xml
add
 doc
  ...
 /doc
 doc
  ...
 /doc
 /add

Is there any way by which we can tell solr to skip only the docs that
have the actual error? or should we need to post each doc in a separate
file to achieve granularity in adding?

-- 
Thanks,
Vanniarajan


Re: Posting multiple documents at once - clarification

2009-07-20 Thread Noble Paul നോബിള്‍ नोब्ळ्
if the error is an xml parsing error there is no way of continuing
from that point. even otherwise , solr assumes that if the whole
payload is not correct it is to be discarded

On Mon, Jul 20, 2009 at 6:32 PM, Vannia Rajankvanniara...@gmail.com wrote:
 Hi,

  When we post a file with a number of documents of the format shown below
 to solr, if there is some 'error' in one of the doc, then all the docs
 in the file are error-ed out and not added to the Solr-index.

        ?xml
        add
             doc
                  ...
             /doc
             doc
                  ...
             /doc
         /add

    Is there any way by which we can tell solr to skip only the docs that
 have the actual error? or should we need to post each doc in a separate
 file to achieve granularity in adding?

 --
 Thanks,
 Vanniarajan




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


RE: Word frequency count in the index

2009-07-20 Thread Daniel Alheiros
Hi Wunder,

Thanks for your reply!

I take your point. It has to be appropriate to your content... In the
cases I deal with, using stop words wouldn't be a big deal because the
documents we handle are usually a proper article (although titles could
still be impacted by it).

I based my stop words on the most frequent terms I could find on my
index when I indexed my whole database. I'm sure it will change over
time but itf would deal with the rest. I'm inclined to keep it like
this, but maybe some tests and real query analisys would be good. I will
let you know if any interesting patterns emerges.

Cheers,
Daniel 

-Original Message-
From: Walter Underwood [mailto:wunderw...@netflix.com] 
Sent: 16 July 2009 17:15
To: solr-user@lucene.apache.org
Subject: Re: Word frequency count in the index

I haven't researched old versions of Lucene, but I think it has always
been a vector space, tf.idf engine. I don't see any hint of
probabilistic scoring.

A bit of background about stop words and idf. They are two versions of
the same thing.

Stop words are a manual, on/off decision about what words are important.
That decision is high risk and easy to get wrong. We have a movie titled
To be and to have. Oops.

Inverse document frequency (idf) replaces that on/off control with a
proportional weight calculated from the index. For Netflix, that means
that weeds: season 2 has a high weight for weeds and lower weights
for season and 2.

In my control theory course, my professor told me to only use
proportional control when on/off didn't work. Well, stop words don't
work and idf does.

For a longer list of movie titles entirely made of stop words, go here:

http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html

wunder

On 7/16/09 8:50 AM, Daniel Alheiros daniel.alhei...@bbc.co.uk wrote:

 Hi Walter,
 
 Has it always been there? Which version of Lucene are we talking
about?
 
 Regards,
 Daniel
 
 -Original Message-
 From: Walter Underwood [mailto:wunderw...@netflix.com]
 Sent: 16 July 2009 15:04
 To: solr-user@lucene.apache.org
 Subject: Re: Word frequency count in the index
 
 Lucene uses a tf.idf relevance formula, so it automatically finds 
 common words (stop words) in your documents and gives them lower 
 weight. I recommend not removing stop words at all and letting Lucene 
 handle the weighting.
 
 wunder
 
 On 7/16/09 3:29 AM, Pooja Verlani pooja.verl...@gmail.com wrote:
 
 Hi,
 
 Is there any way in SOLR to know the count of each word indexed in 
 the
 
 solr ?
 I want to find out the different word frequencies to figure out '
 application specific stop words'.
 
 Please let me know if its possible.
 
 Thank you,
 Regards,
 Pooja
 
 
 http://www.bbc.co.uk/
 This e-mail (and any attachments) is confidential and may contain 
 personal views which are not the views of the BBC unless specifically
stated.
 If you have received it in error, please delete it from your system.
 Do not use, copy or disclose the information in any way nor act in 
 reliance on it and notify the sender immediately.
 Please note that the BBC monitors e-mails sent or received.
 Further communication will signify your consent to this.
 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



method inform of SolrCoreAware callled 2 times

2009-07-20 Thread Marc Sturlese

Hey there,
I have implemented a custom component wich extends SearchComponent and
implements SolrCoreAware.
I have decalred it in solrconfig.xml as:
 searchComponent name=mycomp class=solr.MyCustomComponent 

And added it in my Searchhandler as:
 arr name=last-components
   strmycomp/str
 /arr

I am using multicore with two cores.
I have noticed (doing some logging) that the method inform (the ones that
implements SolrCoreAware) in being called 2 times per each core when I start
my solr instance. As I understood SolrCoreAware inform method should be
just called once per core, am I right or it's normal that is is called 2
times per core?



-- 
View this message in context: 
http://www.nabble.com/method-inform-of-SolrCoreAware-callled-2-times-tp24570221p24570221.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Wildcards at the Beginning of a Search.

2009-07-20 Thread Jeff Newburn
There is a hacky way to do it if you can pull it off.  You can prepend some
known prefix to the field then strip it off when you get the results back.
An example would be putting Phone: in front of every value in a phone number
field then instead of searching like this *-111- (which won't work) you
would search (Phone: *-111-).  Keep in mind this way will work
syntactically but basically changes the index into a file sort so you will
see a performance dip.
-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


 From: Erik Hatcher e...@ehatchersolutions.com
 Reply-To: solr-user@lucene.apache.org
 Date: Mon, 20 Jul 2009 08:20:15 -0400
 To: solr-user@lucene.apache.org
 Subject: Re: Wildcards at the Beginning of a Search.
 
 See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently
 does not have leading wildcard support enabled.
 
 Erik
 
 On Jul 20, 2009, at 8:09 AM, Jörg Agatz wrote:
 
 Hallo Solr Users...
 
 I tryed to search with a Wildcard at the beginning from a search.
 
 for example, i will search for *est and get test, vogelnest,
 fest, 
 But it dosent work, i alsways get an error...
 
 Now my Big brother GOOGLE tolds me, that it can work but a search
 with a
 Wildcad at the beginning need a long time...
 
 Now i will test ist. but How?
 



Re: Wildcards at the Beginning of a Search.

2009-07-20 Thread Reza Safari
Add setAllowLeadingWildcard(true); to the constructor of  
org.apache.solr.search.SolrQueryParser.java


Gr, Reza

On Jul 20, 2009, at 4:00 PM, Jeff Newburn wrote:

There is a hacky way to do it if you can pull it off.  You can  
prepend some
known prefix to the field then strip it off when you get the results  
back.
An example would be putting Phone: in front of every value in a  
phone number
field then instead of searching like this *-111- (which won't  
work) you

would search (Phone: *-111-).  Keep in mind this way will work
syntactically but basically changes the index into a file sort so  
you will

see a performance dip.
--
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562



From: Erik Hatcher e...@ehatchersolutions.com
Reply-To: solr-user@lucene.apache.org
Date: Mon, 20 Jul 2009 08:20:15 -0400
To: solr-user@lucene.apache.org
Subject: Re: Wildcards at the Beginning of a Search.

See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently
does not have leading wildcard support enabled.

Erik

On Jul 20, 2009, at 8:09 AM, Jörg Agatz wrote:


Hallo Solr Users...

I tryed to search with a Wildcard at the beginning from a search.

for example, i will search for *est and get test, vogelnest,
fest, 
But it dosent work, i alsways get an error...

Now my Big brother GOOGLE tolds me, that it can work but a search
with a
Wildcad at the beginning need a long time...

Now i will test ist. but How?























Solr tika and posting .pst files

2009-07-20 Thread S.Selvam
Hi,

I am using Solr-Tika to post various files.When i try to post .pst
file(outlook express), the file is being posted but it does not contain any
data.I could not found anything useful after googling.

Regarding solrschema , i use

  1) id
  2) content(this is the default field)

Do i need to configure Tika to be able to handle .pst format ? ,I would like
to hear your suggestions.

Note:1) I use VB.NET as a front end tool.
   2) Other file contents are properly mapped to content field.

-- 
Yours,
S.Selvam


RE: Wildcards at the Beginning of a Search.

2009-07-20 Thread Brian Klippel
Depending on how you are sending docs in for indexing, you could also add an 
additional field who's value was a string reverse of the primary value.  Then 
search that field with a trialing wildcard.



-Original Message-
From: Jeff Newburn [mailto:jnewb...@zappos.com] 
Sent: Monday, July 20, 2009 10:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Wildcards at the Beginning of a Search.

There is a hacky way to do it if you can pull it off.  You can prepend some
known prefix to the field then strip it off when you get the results back.
An example would be putting Phone: in front of every value in a phone number
field then instead of searching like this *-111- (which won't work) you
would search (Phone: *-111-).  Keep in mind this way will work
syntactically but basically changes the index into a file sort so you will
see a performance dip.
-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


 From: Erik Hatcher e...@ehatchersolutions.com
 Reply-To: solr-user@lucene.apache.org
 Date: Mon, 20 Jul 2009 08:20:15 -0400
 To: solr-user@lucene.apache.org
 Subject: Re: Wildcards at the Beginning of a Search.
 
 See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently
 does not have leading wildcard support enabled.
 
 Erik
 
 On Jul 20, 2009, at 8:09 AM, Jörg Agatz wrote:
 
 Hallo Solr Users...
 
 I tryed to search with a Wildcard at the beginning from a search.
 
 for example, i will search for *est and get test, vogelnest,
 fest, 
 But it dosent work, i alsways get an error...
 
 Now my Big brother GOOGLE tolds me, that it can work but a search
 with a
 Wildcad at the beginning need a long time...
 
 Now i will test ist. but How?
 



Re: Posting multiple documents at once - clarification

2009-07-20 Thread Vannia Rajan
2009/7/20 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 if the error is an xml parsing error there is no way of continuing
 from that point. even otherwise , solr assumes that if the whole
 payload is not correct it is to be discarded


Thank you for your response

-- 
Thanks,
Vanniarajan


Implementing related tags

2009-07-20 Thread James T
Hi,

I have a specific requirement for searching and looking for some help from
the community on how to achieve it using solr:

I need to index 1million + documents. Each document contains ( among other
fields ) 3 fields representing the category which that doc belongs to. For
example ( a very simplied case to make it easier to explain )

Doc 1
  Place : NY, Paris, Tokyo
  Authors: AuthorA, AuthorB, AuthorC, AuthorD
  Tags: tagA, tagB, ballon

Doc 2
  Place : Bangkok
  Authors: AuthorD
  Tags: tagZ

So each doc can contain multiple values for each of above fields ( place,
author, tags )

Now the searching requirements is that, by constrainting on one of the
value, I need a search on related fields.

Example: By giving a constraint Author: AuthorD, I need a search on the
search space:
Place: Ny, Paris, Tokyo and London
Author: AuthorA, AuthorB, AuthorC,
Tags: tagA, tagB and tagZ
( The above result is generated by the fact that every item in the result
has atleast 1 doc in common with AuthorD )

So as I am typing Ba, I need to get Ballon and Bangkok ( These Tags and
Places have atleast 1doc where it also had AuthorD )

Is such a system possible to implement using solr?

Thanks!


Recommended Articles

2009-07-20 Thread Jeff Newburn
Does anyone have links or books to recommended reading on search in general.
Would like to see some literature on larger search concepts and ideas.
-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562



Re: Recommended Articles

2009-07-20 Thread darren
http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Dstripbooksfield-keywords=searchx=0y=0

 Does anyone have links or books to recommended reading on search in
 general.
 Would like to see some literature on larger search concepts and ideas.
 --
 Jeff Newburn
 Software Engineer, Zappos.com
 jnewb...@zappos.com - 702-943-7562





Re: Recommended Articles

2009-07-20 Thread Mark Miller

dar...@ontrenet.com wrote:

http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Dstripbooksfield-keywords=searchx=0y=0

  

Does anyone have links or books to recommended reading on search in
general.
Would like to see some literature on larger search concepts and ideas.
--
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562





  

Check out: http://wiki.apache.org/lucene-java/InformationRetrieval

Some good stuff there, though I don't think often updated.

My favorite is this free gem:
http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html

--
- Mark

http://www.lucidimagination.com





Solr JMX and Cacti

2009-07-20 Thread Edward Capriolo
Hey all,

We have several deployments of Solr across our enterprise. Our largest
one is a several GB and when enough documents are added an OOM
exception is occurring.

To debug this problem I have enable JMX. My goal is to write some
cacti templates similar to the ones I have done for hadoop.
http://www.jointhegrid.com/hadoop/. The only cacti template for solr I
have found is old, broken and is using curl and PHP to try and read
the values off the web interface. I have a few general
questions/comments and also would like to know how others are dealing
with this.

1) SNMP has counters/gauges. With JMX it is hard to know what a
variable is without watching it for a while. Some fields are obvious,
(total_x) (cumulative_x) it is worth wild to add some notes in the
MBEAN info to say works like counter works like gauge. This way a
network engineer like me does not have to go code surfing to figure
out how to graph them.

 Has anyone written up a list of what the attributes are, types, and
what they mean?

2) The values that are not counter style I am assuming are sampled,
what is the sampling rate and is it adjustable?

Any tips are helpful. Thank you,


Re: Solr JMX and Cacti

2009-07-20 Thread Ryan McKinley


On Jul 20, 2009, at 8:47 AM, Edward Capriolo wrote:


Hey all,

We have several deployments of Solr across our enterprise. Our largest
one is a several GB and when enough documents are added an OOM
exception is occurring.

To debug this problem I have enable JMX. My goal is to write some
cacti templates similar to the ones I have done for hadoop.
http://www.jointhegrid.com/hadoop/. The only cacti template for solr I
have found is old, broken and is using curl and PHP to try and read
the values off the web interface. I have a few general
questions/comments and also would like to know how others are dealing
with this.

1) SNMP has counters/gauges. With JMX it is hard to know what a
variable is without watching it for a while. Some fields are obvious,
(total_x) (cumulative_x) it is worth wild to add some notes in the
MBEAN info to say works like counter works like gauge. This way a
network engineer like me does not have to go code surfing to figure
out how to graph them.

Has anyone written up a list of what the attributes are, types, and
what they mean?

2) The values that are not counter style I am assuming are sampled,
what is the sampling rate and is it adjustable?

Any tips are helpful. Thank you,


Check:
http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/handler/RequestHandlerBase.java

For cacti, you should probably ignore the two 'rate' based  
calculations as they are just derivatives:
lst.add(avgTimePerRequest, (float) totalTime / (float)  
this.numRequests); lst.add(avgRequestsPerSecond, (float)  
numRequests*1000 / (float)(System.currentTimeMillis()-handlerStart));






SolrJ embedded server : error while adding document

2009-07-20 Thread Gérard Dupont
Hi SolR guys,

I'm starting to play with SolR after few years with classic Lucene. I'm
trying to index a single document using the embedded server, but I got a
strange error which looks like XML parsing problem (see trace hereafter). To
add details, this is a simple Junit which create single document then pass
it to the server in a ArraylistSolrInputDocument. The document only have 2
fields id and text as it is described in the configuration.

ul 20, 2009 5:50:50 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: missing content stream
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:114)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:147)
at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
at
org.weblab_project.services.solr.SolrComponent.flushIndexBuffer(SolrComponent.java:132)
at
org.weblab_project.services.solr.SolrComponentTest.testAddOneDocument(SolrComponentTest.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:154)
at junit.framework.TestCase.runBare(TestCase.java:127)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.framework.TestSuite.runTest(TestSuite.java:208)
at junit.framework.TestSuite.run(TestSuite.java:203)
at
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
at
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)

Jul 20, 2009 5:50:50 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=null path=/update params={} status=500 QTime=6
Cannot flush the index buffer : Server error while adding documents

-- 
Gérard Dupont
Information Processing Control and Cognition (IPCC) - EADS DS
http://weblab-project.org

Document  Learning team - LITIS Laboratory


Re: SolrJ embedded server : error while adding document

2009-07-20 Thread Gérard Dupont
my mistake, pb with the buffer I added. But it raises a question : does solr
(using embedded server) has its own buffer mechanism in indexing or not ? I
guess not but I might be wrong.

2009/7/20 Gérard Dupont ger.dup...@gmail.com

 Hi SolR guys,

 I'm starting to play with SolR after few years with classic Lucene. I'm
 trying to index a single document using the embedded server, but I got a
 strange error which looks like XML parsing problem (see trace hereafter). To
 add details, this is a simple Junit which create single document then pass
 it to the server in a ArraylistSolrInputDocument. The document only have 2
 fields id and text as it is described in the configuration.

 ul 20, 2009 5:50:50 PM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: missing content stream
 at
 org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:114)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
 at
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:147)
 at
 org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217)
 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
 at
 org.weblab_project.services.solr.SolrComponent.flushIndexBuffer(SolrComponent.java:132)
 at
 org.weblab_project.services.solr.SolrComponentTest.testAddOneDocument(SolrComponentTest.java:66)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at junit.framework.TestCase.runTest(TestCase.java:154)
 at junit.framework.TestCase.runBare(TestCase.java:127)
 at junit.framework.TestResult$1.protect(TestResult.java:106)
 at junit.framework.TestResult.runProtected(TestResult.java:124)
 at junit.framework.TestResult.run(TestResult.java:109)
 at junit.framework.TestCase.run(TestCase.java:118)
 at junit.framework.TestSuite.runTest(TestSuite.java:208)
 at junit.framework.TestSuite.run(TestSuite.java:203)
 at
 org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
 at
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
 at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
 at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
 at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
 at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)

 Jul 20, 2009 5:50:50 PM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=null path=/update params={} status=500 QTime=6
 Cannot flush the index buffer : Server error while adding documents

 --
 Gérard Dupont
 Information Processing Control and Cognition (IPCC) - EADS DS
 http://weblab-project.org

 Document  Learning team - LITIS Laboratory




-- 
Gérard Dupont
Information Processing Control and Cognition (IPCC) - EADS DS
http://weblab-project.org

Document  Learning team - LITIS Laboratory


Exception searching PhoneticFilterFactory field with number

2009-07-20 Thread Robert Petersen
Reposting in hopes of an answer...

 

Hello all, 

 

I am getting the following exception whenever a user includes a numeric
term in their search, and the search includes a field defined with a
PhoneticFilterFactory and further it occurs whether I use the
DoubleMetaphone encoder or any other.  Has this ever come up before?  I
can replicate this with no data in the index at all, but if I search the
field by hand from the solr web interface there is no exception.  I am
running the lucid imagination 1.3 certified release in a multicore
master/slaves configuration.  I will include the field def and the
search/exception below and let me know if I can include any more
clues... seems like it's trying to make a field with no name/value:  

 

fieldType name=spellcheck class=solr.TextField
positionIncrementGap=100

analyzer type=index

tokenizer
class=solr.WhitespaceTokenizerFactory/

filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/

filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt/

!--filter
class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0/--

filter class=solr.LowerCaseFilterFactory/

filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/

filter
class=solr.RemoveDuplicatesTokenFilterFactory/

filter class=solr.PhoneticFilterFactory
encoder=DoubleMetaphone inject=false/

/analyzer

analyzer type=query

tokenizer
class=solr.WhitespaceTokenizerFactory/

filter class=solr.SynonymFilterFactory
synonyms=query_synonyms.txt ignoreCase=true expand=true/

filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt/

!--filter
class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0/--

filter class=solr.LowerCaseFilterFactory/

filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/

filter
class=solr.RemoveDuplicatesTokenFilterFactory/

filter class=solr.PhoneticFilterFactory
encoder=DoubleMetaphone inject=false/

/analyzer

/fieldType

 

Jul 17, 2009 2:42:18 PM org.apache.solr.core.SolrCore execute

INFO: [10017] webapp=/solr path=/select/
params={f.partitionId.facet.limit=10f.categoryId.facet.missing=falsef.
categoryId.facet.zeros=falsefacet=truefacet=truefacet=truefacet=true
facet=truefacet=truef.taxonomyCategoryId.facet.limit=-1f.priceBucket
id.facet.limit=-1f.partitionId.facet.zeros=falsef.categoryId.facet.sor
t=truef.categoryId.facet.limit=-1f.marketplaceIds.facet.limit=10f.mfg
Id.facet.missing=falsef.priceBucketid.facet.zeros=falsedebugQuery=true
f.priceBucketid.facet.sort=truef.partitionId.facet.missing=falsef.tax
onomyCategoryId.facet.zeros=falsef.priceBucketid.facet.missing=falsefa
cet.field=categoryIdfacet.field=taxonomyCategoryIdfacet.field=partitio
nIdfacet.field=mfgIdfacet.field=marketplaceIdsfacet.field=priceBucket
idf.mfgId.facet.zeros=falsef.taxonomyCategoryId.facet.sort=truef.mark
etplaceIds.facet.missing=falserows=48f.partitionId.facet.sort=truesta
rt=0q=(sku:va+AND+sku:2226+AND+sku:w))+OR+((upc:va+AND+upc:
2226+AND+upc:w))+OR+((mfgPartNo:va+AND+mfgPartNo:2226+AND+mfgPar
tNo:w))+OR+((title_en_uk:va+AND+title_en_uk:2226+AND+title_en_uk:
w))^8+OR+((moreWords_en_uk:va+AND+moreWords_en_uk:2226+AND+moreWord
s_en_uk:w))^2+OR+((allDoublemetaphone:va+AND+allDoublemetaphone:222
6+AND+allDoublemetaphone:w))^0.5)+AND+((_val_:sum\(product\(boosted,
30\),product\(sales,1000\),product\(views,10\),product\(image,100\)\
)f.taxonomyCategoryId.facet.missing=falsef.mfgId.facet.limit=10f
.marketplaceIds.facet.sort=truef.marketplaceIds.facet.zeros=falsef.mfg
Id.facet.sort=true} hits=0 status=500 QTime=84 

Jul 17, 2009 2:42:18 PM org.apache.solr.common.SolrException log

SEVERE: java.lang.RuntimeException: java.lang.IllegalArgumentException:
name and value cannot both be empty

at
org.apache.solr.search.QueryParsing.toString(QueryParsing.java:470)

at
org.apache.solr.util.SolrPluginUtils.doStandardDebug(SolrPluginUtils.jav
a:399)

at
org.apache.solr.handler.component.DebugComponent.process(DebugComponent.
java:54)

at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search
Handler.java:177)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)

at
org.apache.solr.core.SolrCore.execute(SolrCore.java:1205)

at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
va:303)

at

Re: Implementing related tags

2009-07-20 Thread Avlesh Singh
Have a look at the MoreLikeThis component -
http://wiki.apache.org/solr/MoreLikeThis

Cheers
Avlesh

On Mon, Jul 20, 2009 at 8:05 PM, James T codetester.codetes...@gmail.comwrote:

 Hi,

 I have a specific requirement for searching and looking for some help from
 the community on how to achieve it using solr:

 I need to index 1million + documents. Each document contains ( among other
 fields ) 3 fields representing the category which that doc belongs to. For
 example ( a very simplied case to make it easier to explain )

 Doc 1
  Place : NY, Paris, Tokyo
  Authors: AuthorA, AuthorB, AuthorC, AuthorD
  Tags: tagA, tagB, ballon

 Doc 2
  Place : Bangkok
  Authors: AuthorD
  Tags: tagZ

 So each doc can contain multiple values for each of above fields ( place,
 author, tags )

 Now the searching requirements is that, by constrainting on one of the
 value, I need a search on related fields.

 Example: By giving a constraint Author: AuthorD, I need a search on the
 search space:
Place: Ny, Paris, Tokyo and London
Author: AuthorA, AuthorB, AuthorC,
Tags: tagA, tagB and tagZ
 ( The above result is generated by the fact that every item in the result
 has atleast 1 doc in common with AuthorD )

 So as I am typing Ba, I need to get Ballon and Bangkok ( These Tags and
 Places have atleast 1doc where it also had AuthorD )

 Is such a system possible to implement using solr?

 Thanks!



Re: SolrJ embedded server : error while adding document

2009-07-20 Thread Ryan McKinley

not sure what you mean...  yes, i guess...

you send a bunch of requests with add( doc/collection ) and they are  
not visible until you send commit()



On Jul 20, 2009, at 9:07 AM, Gérard Dupont wrote:

my mistake, pb with the buffer I added. But it raises a question :  
does solr
(using embedded server) has its own buffer mechanism in indexing or  
not ? I

guess not but I might be wrong.

2009/7/20 Gérard Dupont ger.dup...@gmail.com


Hi SolR guys,

I'm starting to play with SolR after few years with classic Lucene.  
I'm
trying to index a single document using the embedded server, but I  
got a
strange error which looks like XML parsing problem (see trace  
hereafter). To
add details, this is a simple Junit which create single document  
then pass
it to the server in a ArraylistSolrInputDocument. The document  
only have 2

fields id and text as it is described in the configuration.

ul 20, 2009 5:50:50 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: missing content stream
   at
org 
.apache 
.solr 
.handler 
.XmlUpdateRequestHandler 
.handleRequestBody(XmlUpdateRequestHandler.java:114)

   at
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
   at
org 
.apache 
.solr 
.client 
.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java: 
147)

   at
org 
.apache 
.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java: 
217)

   at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
   at
org 
.weblab_project 
.services.solr.SolrComponent.flushIndexBuffer(SolrComponent.java:132)

   at
org 
.weblab_project 
.services 
.solr.SolrComponentTest.testAddOneDocument(SolrComponentTest.java:66)

   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
sun 
.reflect 
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

   at
sun 
.reflect 
.DelegatingMethodAccessorImpl 
.invoke(DelegatingMethodAccessorImpl.java:25)

   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:154)
   at junit.framework.TestCase.runBare(TestCase.java:127)
   at junit.framework.TestResult$1.protect(TestResult.java:106)
   at junit.framework.TestResult.runProtected(TestResult.java:124)
   at junit.framework.TestResult.run(TestResult.java:109)
   at junit.framework.TestCase.run(TestCase.java:118)
   at junit.framework.TestSuite.runTest(TestSuite.java:208)
   at junit.framework.TestSuite.run(TestSuite.java:203)
   at
org 
.eclipse 
.jdt 
.internal 
.junit 
.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)

   at
org 
.eclipse 
.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)

   at
org 
.eclipse 
.jdt 
.internal 
.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)

   at
org 
.eclipse 
.jdt 
.internal 
.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)

   at
org 
.eclipse 
.jdt 
.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java: 
386)

   at
org 
.eclipse 
.jdt 
.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java: 
196)


Jul 20, 2009 5:50:50 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=null path=/update params={} status=500 QTime=6
Cannot flush the index buffer : Server error while adding documents

--
Gérard Dupont
Information Processing Control and Cognition (IPCC) - EADS DS
http://weblab-project.org

Document  Learning team - LITIS Laboratory





--
Gérard Dupont
Information Processing Control and Cognition (IPCC) - EADS DS
http://weblab-project.org

Document  Learning team - LITIS Laboratory




Re: SolrJ embedded server : error while adding document

2009-07-20 Thread Gérard Dupont
On Mon, Jul 20, 2009 at 18:35, Ryan McKinley ryan...@gmail.com wrote:

 you send a bunch of requests with add( doc/collection ) and they are not
 visible until you send commit()


That's what I meant thanks.

-- 
Gérard Dupont
Information Processing Control and Cognition (IPCC) - EADS DS
http://weblab-project.org

Document  Learning team - LITIS Laboratory


Indexing issue with XML control characters

2009-07-20 Thread Rupert Fiasco
During indexing I will often get this error:

SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
character ((CTRL-CHAR, code 3))
 at [row,col {unknown-source}]: [2,1]
at 
com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)


By looking at this list and elsewhere I know that I need to filter out
most control characters so I have been employing this regex:

/[\x00-\x08\x0B\x0C\x0E-\x1F]/

But I still get the error. What is strange is that if I re-run my
indexing process after a failure it will work on the previously failed
node and then error out on another node some time later. That is, it
is not deterministic. If I look at the text that is attempted to be
indexed its pure as you can get one (a bunch of medical keywords like
leg bones and nose).

Any ideas would be greatly appreciated.

The platform is:

Solr implementation version: 1.3.0 694707
Lucene implementation version: 2.4-dev 691741
Mac OS X 10.5.7
JVM 1.5.0_19-b02-304


Thanks
/Rupert


Re: How to configure Solr in Glassfish ?

2009-07-20 Thread Mark Miller
What have you tried? Deploying the Solr war should be pretty
straightforward. The main issue is likely setting solr.home. You likely have
a lot of options there though. You can set a system property in the startup
script, set a system property in the webapp context xml (if you can locate
it), or I think glassfish offers a GUI to set such things. There really
shouldn't be much more to it than that, but you should try and see what you
run into.
I havn't tried out glassfish in a couple years now.

-- 
- Mark

http://www.lucidimagination.com

On Mon, Jul 20, 2009 at 8:27 AM, huenzhao huenz...@126.com wrote:


 I want use glassfish as the solr search server, but I don't know how to
 configure.
 Anybody knows?

 enzhao...@gmail.com
 Thanks!

 --
 View this message in context:
 http://www.nabble.com/How-to-configure-Solr--in-Glassfish---tp24565758p24565758.html
 Sent from the Solr - User mailing list archive at Nabble.com.




--


Re: Solr JMX and Cacti

2009-07-20 Thread Edward Capriolo
On Mon, Jul 20, 2009 at 12:31 PM, Ryan McKinleyryan...@gmail.com wrote:

 On Jul 20, 2009, at 9:16 AM, Edward Capriolo wrote:

 On Mon, Jul 20, 2009 at 11:53 AM, Ryan McKinleyryan...@gmail.com wrote:

 On Jul 20, 2009, at 8:47 AM, Edward Capriolo wrote:

 Hey all,

 We have several deployments of Solr across our enterprise. Our largest
 one is a several GB and when enough documents are added an OOM
 exception is occurring.

 To debug this problem I have enable JMX. My goal is to write some
 cacti templates similar to the ones I have done for hadoop.
 http://www.jointhegrid.com/hadoop/. The only cacti template for solr I
 have found is old, broken and is using curl and PHP to try and read
 the values off the web interface. I have a few general
 questions/comments and also would like to know how others are dealing
 with this.

 1) SNMP has counters/gauges. With JMX it is hard to know what a
 variable is without watching it for a while. Some fields are obvious,
 (total_x) (cumulative_x) it is worth wild to add some notes in the
 MBEAN info to say works like counter works like gauge. This way a
 network engineer like me does not have to go code surfing to figure
 out how to graph them.

 Has anyone written up a list of what the attributes are, types, and
 what they mean?

 2) The values that are not counter style I am assuming are sampled,
 what is the sampling rate and is it adjustable?

 Any tips are helpful. Thank you,

 Check:

 http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/handler/RequestHandlerBase.java

 For cacti, you should probably ignore the two 'rate' based calculations
 as
 they are just derivatives:
 lst.add(avgTimePerRequest, (float) totalTime / (float)
 this.numRequests);
 lst.add(avgRequestsPerSecond, (float) numRequests*1000 /
 (float)(System.currentTimeMillis()-handlerStart));




 Thanks Ryan,

 Actually, I typically graph the derivatives directly. As graphing a
 derivative is usually easier then writing cacti CDEF's which can be
 fickle when exporting the templates between versions, but I see your
 point.

 However, do you see the point I was getting at? Without MBEAN info
 stating that these values are derivatives I have to dig through source
 code. It is not a complaint, just a note that their seems to be so
 much work on JMX counters, but just a few word description in the
 MBEAN info would eliminate the need to dig through the source tree
 when it actually comes time to for someone to render these counters.


 no doubt -- i am unfamiliar with how these get passed to JMX (or where extra
 docs would be helpful) -- feel free to submit a patch that adds this info,
 perhaps to the wiki? javadoc? this way it will be easier for the next guy


 Also one more question on my mind, how are the JMX objects effected by
 a multi core deployment. Does each core have its own objects or are
 they shared?


 each core/handler gets its own object -- they are not shared across cores.


 Thank you,
 Edward


Ryan,

After adding a jmx in the solconfig.xml and setting some command
line -D options. JMX is available on a tcp port. From that point a
java tools can read the value directly.

I have console programs that output the value so cacti 'data input
methods' can read the data in. I just subclass this
...http://www.jointhegrid.com/svn/hadoop-cacti-jtg/trunk/src/com/jointhegrid/hadoopjmx/JMXBase.java

The jconsole GUI tool then allows you to browse the JMX tree. If the
MBEAN info is filled in it is displayed directly to the user. Patching
the attributes to have a more verbose descriptions would be very
helpful. I will open a Jira for that.

Thanks,
Edward


RE: multi-word synonyms with multiple matches

2009-07-20 Thread Ensdorf Ken
 You haven't given us the full details on how you are using the
 SynonymFilterFactory (expand true or false?) but in general: yes the
 SynonymFilter finds the longest match it can.

Sorry - doing expansion at index time:
filter class=solr.SynonymFilterFactory synonyms=title_synonyms.txt 
ignoreCase=true expand=true/


 if every svp is also a vp, then being explict in your synonyms (when
 doing
 index time expansion) should work...

 vp,vice president
 svp,senior vice president=vp,svp,senior vice president

That worked - thanks!



Re: Implementing related tags

2009-07-20 Thread James T
That does not seem to work fine. To further simplify the issue, assuming
there is a multi valued tag field and number of docs is 1 million. By
constrainting on a given tag, I need to search on the related tags.

So
Doc 1:
   tags: tagA, tagB, tagC, ball
Doc 2:
   tags: tagA, bat

Now constrainting on tagA and searching for ba*,  I need something like
http://localhost:8983/solr/memoir/select?fq=tag:tagAq=(tags%3Aba*) and just
return the related tags ( not the docs where that tag is present )

tagA maybe present in 20K docs ( of 1 million docs), but tagA might have
totally 100 other related tags ( i.e those 100 tags had appeared with tagA
in atleast 1 doc ). So the search space ( by constrainting on tagA ) is
100 and not 1million.

Hope that helps in explaining the issue better.

Thanks!


On Mon, Jul 20, 2009 at 9:51 PM, Avlesh Singh avl...@gmail.com wrote:

 Have a look at the MoreLikeThis component -
 http://wiki.apache.org/solr/MoreLikeThis

 Cheers
 Avlesh

 On Mon, Jul 20, 2009 at 8:05 PM, James T codetester.codetes...@gmail.com
 wrote:

  Hi,
 
  I have a specific requirement for searching and looking for some help
 from
  the community on how to achieve it using solr:
 
  I need to index 1million + documents. Each document contains ( among
 other
  fields ) 3 fields representing the category which that doc belongs to.
 For
  example ( a very simplied case to make it easier to explain )
 
  Doc 1
   Place : NY, Paris, Tokyo
   Authors: AuthorA, AuthorB, AuthorC, AuthorD
   Tags: tagA, tagB, ballon
 
  Doc 2
   Place : Bangkok
   Authors: AuthorD
   Tags: tagZ
 
  So each doc can contain multiple values for each of above fields ( place,
  author, tags )
 
  Now the searching requirements is that, by constrainting on one of the
  value, I need a search on related fields.
 
  Example: By giving a constraint Author: AuthorD, I need a search on the
  search space:
 Place: Ny, Paris, Tokyo and London
 Author: AuthorA, AuthorB, AuthorC,
 Tags: tagA, tagB and tagZ
  ( The above result is generated by the fact that every item in the result
  has atleast 1 doc in common with AuthorD )
 
  So as I am typing Ba, I need to get Ballon and Bangkok ( These Tags and
  Places have atleast 1doc where it also had AuthorD )
 
  Is such a system possible to implement using solr?
 
  Thanks!
 



Re: Implementing related tags

2009-07-20 Thread Bill Au
Faceting on tags will give you all the related tags, including the original
tag (tagA in your case).  You will have to filter out the original tag on
the client side if you don't want to show that.  With Solar 1.4, you will be
able to use localParam to exclude the original tag in the results.  If you
tags field is analyzed, you will want to facet on a raw copy (using copy
field) of the tags.

If you want related tags that starts with ba, you can use a facet.prefix;

q=tags:tagAfacet=truefacet.mincount=1facet.perfix=ab

Bill

On Mon, Jul 20, 2009 at 2:40 PM, Avlesh Singh avl...@gmail.com wrote:

 If I understood your problem correctly, faceting on tags field is what
 you
 need. Try this -
 http://localhost:8983/solr/ goog_1248106219337
 memoir/select?fq=tag:tagAq=( goog_1248106219337
 tags%3Aba*)facet=truefacet.field=tagsfacet.mincount=1
 http://localhost:8983/solr/memoir/select?fq=tag:tagAq=%28tags%3Aba*%29facet=truefacet.field=tagsfacet.mincount=1
 

 Notice the usage of facet parameters. Locate the facet_counts section in
 your response. If this is what you were looking for, then
 http://wiki.apache.org/solr/SimpleFacetParameters might be a good read.

 Cheers
 Avlesh

 On Mon, Jul 20, 2009 at 11:37 PM, James T
 codetester.codetes...@gmail.comwrote:

  That does not seem to work fine. To further simplify the issue, assuming
  there is a multi valued tag field and number of docs is 1 million. By
  constrainting on a given tag, I need to search on the related tags.
 
  So
  Doc 1:
tags: tagA, tagB, tagC, ball
  Doc 2:
tags: tagA, bat
 
  Now constrainting on tagA and searching for ba*,  I need something
 like
  http://localhost:8983/solr/memoir/select?fq=tag:tagAq=(tags%3Aba*)http://localhost:8983/solr/memoir/select?fq=tag:tagAq=%28tags%3Aba*%29
 http://localhost:8983/solr/memoir/select?fq=tag:tagAq=%28tags%3Aba*%29and
 just
  return the related tags ( not the docs where that tag is present )
 
  tagA maybe present in 20K docs ( of 1 million docs), but tagA might
  have
  totally 100 other related tags ( i.e those 100 tags had appeared with
  tagA
  in atleast 1 doc ). So the search space ( by constrainting on tagA ) is
  100 and not 1million.
 
  Hope that helps in explaining the issue better.
 
  Thanks!
 
 
  On Mon, Jul 20, 2009 at 9:51 PM, Avlesh Singh avl...@gmail.com wrote:
 
   Have a look at the MoreLikeThis component -
   http://wiki.apache.org/solr/MoreLikeThis
  
   Cheers
   Avlesh
  
   On Mon, Jul 20, 2009 at 8:05 PM, James T 
  codetester.codetes...@gmail.com
   wrote:
  
Hi,
   
I have a specific requirement for searching and looking for some help
   from
the community on how to achieve it using solr:
   
I need to index 1million + documents. Each document contains ( among
   other
fields ) 3 fields representing the category which that doc belongs
 to.
   For
example ( a very simplied case to make it easier to explain )
   
Doc 1
 Place : NY, Paris, Tokyo
 Authors: AuthorA, AuthorB, AuthorC, AuthorD
 Tags: tagA, tagB, ballon
   
Doc 2
 Place : Bangkok
 Authors: AuthorD
 Tags: tagZ
   
So each doc can contain multiple values for each of above fields (
  place,
author, tags )
   
Now the searching requirements is that, by constrainting on one of
 the
value, I need a search on related fields.
   
Example: By giving a constraint Author: AuthorD, I need a search on
  the
search space:
   Place: Ny, Paris, Tokyo and London
   Author: AuthorA, AuthorB, AuthorC,
   Tags: tagA, tagB and tagZ
( The above result is generated by the fact that every item in the
  result
has atleast 1 doc in common with AuthorD )
   
So as I am typing Ba, I need to get Ballon and Bangkok ( These Tags
  and
Places have atleast 1doc where it also had AuthorD )
   
Is such a system possible to implement using solr?
   
Thanks!
   
  
 



index version on slave

2009-07-20 Thread solr jay
If you ask for the index version of a slave instance, you always get version
number being 0. Is it expected behavior?

I am using this url

http://slave_host:8983/solr/replication?command=indexversion

This request returns correct version on master.

If you use the 'details' command, you get the right version number (and
generation number, and it gives more than what you want).

Thanks,

-- 
J


Re: unable to run the solr in tomcat 5.0

2009-07-20 Thread aligu

try this:

java -Durl=http://localhost:8080/solr/update -jar post.jar filename.xml

it should work.

HH


uday kumar maddigatla wrote:
 
 hi
 
 you mis understood my question.
 
 When i try to use the command java -post.jar *.*. It is trying to Post
 files in Solr which is there in 8983 port. If we use Jety then the default
 port number is 8983. But what about the thing that if we use tomcat which
 uses 8080 as port.
 
 If we use Jetty we can access Solr with this address
 http://localhost:8983/solr.
 
 If we use Tomcat we can access Solr with this address 
 http://localhost:8080/solr.
 
 So if we use above command(java -post.jar) it clearly shows this kind
 of message in command promt
 
 C:\TestDocumetsjava -jar post.jar *.*
 SimplePostTool: version 1.2
 SimplePostTool: WARNING: Make sure your XML documents are encoded in
 UTF-8, other encodings are not currently supported
 SimplePostTool: POSTing files to http://localhost:8983/solr/update..
 SimplePostTool: POSTing file OIO_INV_579814008_14118.xml
 SimplePostTool: FATAL: Connection error (is Solr running at
 http://localhost:8983/solr/update ?): java.net.ConnectException:
 Connection refused: connect
 
 Means it is trying to post the files in Solr running at
 http://localhost:8983/solr/update . But in my case Solr is running in 8080
 port. Only b'use of Solr i can't change my tomcat port number.
 
 Is there any other possibility in Solr to index the documents rather than
 command utility.
 
 
 Michael Ludwig-4 wrote:
 
 uday kumar maddigatla schrieb:
 
 My intention is to use 8080 as port.

 Is there any other way taht Solr will post the files in 8080 port
 
 Solr doesn't post, it listens.
 
 Use the curl utility as indicated in the documentation.
 
 http://wiki.apache.org/solr/UpdateXmlMessages
 
 Michael Ludwig
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/unable-to-run-the-solr-in-tomcat-5.0-tp23400759p24576184.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: hierarchical faceting discussion

2009-07-20 Thread Erik Hatcher
I was particularly surprised by the SOLR-64 numbers.  What makes it's  
response so huge (and thus slow) to return the entire tree of facet  
counts?


Erik


On Jul 19, 2009, at 5:35 PM, Erik Hatcher wrote:

I've posted the details of some experiments I just did comparing/ 
contrasting two approaches for faceting on documents within  
hierarchical structures: http://wiki.apache.org/solr/HierarchicalFaceting


I'm sure I'm only scratching the service with the currently  
implementations of both SOLR-64 and SOLR-792.  Alternative  
approaches are welcome!   As I said on the wiki page, there won't be  
any single method that works in all cases - it will depend on how  
the hierarchical counts are needed - as an entire tree?  (not likely  
in large taxonomic cases!)  How are level pruned counts needed?   
Implementation-wise, seems like payloads could be be useful for some  
use cases.


What are the use cases?

What types and sizes of hierarchies are folks dealing with out there  
in the real world?


Erik





Re: Recommended Articles

2009-07-20 Thread Óscar Marín Miró
I personally love this book:

http://www.amazon.com/Building-Search-Applications-Lucene-LingPipe/dp/0615204252

It intermixes search with analysis: sentiment, named entity recognition, NLP
Pipelines and so on...

There's a little Nutch cameo too...

On Mon, Jul 20, 2009 at 4:56 PM, Mark Miller markrmil...@gmail.com wrote:

 dar...@ontrenet.com wrote:


 http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Dstripbooksfield-keywords=searchx=0y=0



 Does anyone have links or books to recommended reading on search in
 general.
 Would like to see some literature on larger search concepts and ideas.
 --
 Jeff Newburn
 Software Engineer, Zappos.com
 jnewb...@zappos.com - 702-943-7562







 Check out: http://wiki.apache.org/lucene-java/InformationRetrieval

 Some good stuff there, though I don't think often updated.

 My favorite is this free gem:
 http://www-csli.stanford.edu/~hinrich/information-retrieval-book.htmlhttp://www-csli.stanford.edu/%7Ehinrich/information-retrieval-book.html

 --
 - Mark

 http://www.lucidimagination.com






-- 
“I may not believe in myself, but I believe in what I'm doing.”

-- Jimmy Page


Re: Obtaining SOLR index size on disk

2009-07-20 Thread Peter Wolanin
Actually, if you have a server enabled as a replication master, the
stats.jsp page reports the index size, so that information is
available in some cases.

-Peter

On Sat, Jul 18, 2009 at 8:14 AM, Erik Hatchere...@ehatchersolutions.com wrote:

 On Jul 17, 2009, at 8:45 PM, J G wrote:

 Is it possible to obtain the SOLR index size on disk through the SOLR API?
 I've read through the docs and mailing list questions but can't seem to find
 the answer.

 No, but it'd be a great addition to the /admin/system handler which returns
 lots of other useful trivia like the free memory, ulimit, uptime, and such.

        Erik





-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com


Re: Exception searching PhoneticFilterFactory field with number

2009-07-20 Thread Otis Gospodnetic

Robert,

Can you narrow things down by simplifying the query?  For example, I see 
allDoublemetaphone:2226, which looks suspicious in the give me phonetic 
version of the input context, but if you could narrow it down, we could 
probably be able to help more.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Robert Petersen rober...@buy.com
 To: solr-user@lucene.apache.org
 Sent: Monday, July 20, 2009 12:11:38 PM
 Subject: Exception searching PhoneticFilterFactory field with number
 
 Reposting in hopes of an answer...
 
 
 
 Hello all, 
 
 
 
 I am getting the following exception whenever a user includes a numeric
 term in their search, and the search includes a field defined with a
 PhoneticFilterFactory and further it occurs whether I use the
 DoubleMetaphone encoder or any other.  Has this ever come up before?  I
 can replicate this with no data in the index at all, but if I search the
 field by hand from the solr web interface there is no exception.  I am
 running the lucid imagination 1.3 certified release in a multicore
 master/slaves configuration.  I will include the field def and the
 search/exception below and let me know if I can include any more
 clues... seems like it's trying to make a field with no name/value:  
 
 
 
 
 positionIncrementGap=100
 
 
 
 
 class=solr.WhitespaceTokenizerFactory/
 
 
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 
 
 ignoreCase=true words=stopwords.txt/
 
 
 
 
 
 
 protected=protwords.txt/
 
 
 class=solr.RemoveDuplicatesTokenFilterFactory/
 
 
 encoder=DoubleMetaphone inject=false/
 
 
 
 
 
 
 class=solr.WhitespaceTokenizerFactory/
 
 
 synonyms=query_synonyms.txt ignoreCase=true expand=true/
 
 
 ignoreCase=true words=stopwords.txt/
 
 
 
 
 
 
 protected=protwords.txt/
 
 
 class=solr.RemoveDuplicatesTokenFilterFactory/
 
 
 encoder=DoubleMetaphone inject=false/
 
 
 
 
 
 
 
 Jul 17, 2009 2:42:18 PM org.apache.solr.core.SolrCore execute
 
 INFO: [10017] webapp=/solr path=/select/
 params={f.partitionId.facet.limit=10f.categoryId.facet.missing=falsef.
 categoryId.facet.zeros=falsefacet=truefacet=truefacet=truefacet=true
 facet=truefacet=truef.taxonomyCategoryId.facet.limit=-1f.priceBucket
 id.facet.limit=-1f.partitionId.facet.zeros=falsef.categoryId.facet.sor
 t=truef.categoryId.facet.limit=-1f.marketplaceIds.facet.limit=10f.mfg
 Id.facet.missing=falsef.priceBucketid.facet.zeros=falsedebugQuery=true
 f.priceBucketid.facet.sort=truef.partitionId.facet.missing=falsef.tax
 onomyCategoryId.facet.zeros=falsef.priceBucketid.facet.missing=falsefa
 cet.field=categoryIdfacet.field=taxonomyCategoryIdfacet.field=partitio
 nIdfacet.field=mfgIdfacet.field=marketplaceIdsfacet.field=priceBucket
 idf.mfgId.facet.zeros=falsef.taxonomyCategoryId.facet.sort=truef.mark
 etplaceIds.facet.missing=falserows=48f.partitionId.facet.sort=truesta
 rt=0q=(sku:va+AND+sku:2226+AND+sku:w))+OR+((upc:va+AND+upc:
 2226+AND+upc:w))+OR+((mfgPartNo:va+AND+mfgPartNo:2226+AND+mfgPar
 tNo:w))+OR+((title_en_uk:va+AND+title_en_uk:2226+AND+title_en_uk:
 w))^8+OR+((moreWords_en_uk:va+AND+moreWords_en_uk:2226+AND+moreWord
 s_en_uk:w))^2+OR+((allDoublemetaphone:va+AND+allDoublemetaphone:222
 6+AND+allDoublemetaphone:w))^0.5)+AND+((_val_:sum\(product\(boosted,
 30\),product\(sales,1000\),product\(views,10\),product\(image,100\)\
 )f.taxonomyCategoryId.facet.missing=falsef.mfgId.facet.limit=10f
 .marketplaceIds.facet.sort=truef.marketplaceIds.facet.zeros=falsef.mfg
 Id.facet.sort=true} hits=0 status=500 QTime=84 
 
 Jul 17, 2009 2:42:18 PM org.apache.solr.common.SolrException log
 
 SEVERE: java.lang.RuntimeException: java.lang.IllegalArgumentException:
 name and value cannot both be empty
 
 at
 org.apache.solr.search.QueryParsing.toString(QueryParsing.java:470)
 
 at
 org.apache.solr.util.SolrPluginUtils.doStandardDebug(SolrPluginUtils.jav
 a:399)
 
 at
 org.apache.solr.handler.component.DebugComponent.process(DebugComponent.
 java:54)
 
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search
 Handler.java:177)
 
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
 ase.java:131)
 
 at
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1205)
 
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
 va:303)
 
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
 ava:232)
 
 at
 

Re: method inform of SolrCoreAware callled 2 times

2009-07-20 Thread Noble Paul നോബിള്‍ नोब्ळ्
it is not normal to get the inform() called twice for a single object.
which version of solr are you using?

On Mon, Jul 20, 2009 at 7:17 PM, Marc Sturlesemarc.sturl...@gmail.com wrote:

 Hey there,
 I have implemented a custom component wich extends SearchComponent and
 implements SolrCoreAware.
 I have decalred it in solrconfig.xml as:
  searchComponent name=mycomp class=solr.MyCustomComponent 

 And added it in my Searchhandler as:
     arr name=last-components
       strmycomp/str
     /arr

 I am using multicore with two cores.
 I have noticed (doing some logging) that the method inform (the ones that
 implements SolrCoreAware) in being called 2 times per each core when I start
 my solr instance. As I understood SolrCoreAware inform method should be
 just called once per core, am I right or it's normal that is is called 2
 times per core?



 --
 View this message in context: 
 http://www.nabble.com/method-inform-of-SolrCoreAware-callled-2-times-tp24570221p24570221.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


DocList Pagination

2009-07-20 Thread pof

Hi, I am try to get the next DocList page in my custom search component.
Could I get a code example of this?

Cheers.
-- 
View this message in context: 
http://www.nabble.com/DocList-Pagination-tp24581850p24581850.html
Sent from the Solr - User mailing list archive at Nabble.com.