IndexSearcher and Caches

2010-05-21 Thread Rahul R
Hello all,
I have a few questions w.r.t the caches and the IndexSearcher available in
solr. I am using solr 1.3.
- The solr wiki states that the caches are per IndexSearcher object i.e if I
set my filterCache size to 1000 it means that 1000 entries can be assigned
for every IndexSearcher object. Is this true for queryResultsCache,
filterCache and documentCache ? For the document cache, the wiki states that
the value should be greater than (number of records) * (max number of
queries). If the document cache is also sized per IndexSearcher object, then
why do we need the (max number of queries) parameter in the formula ?
- In a web application, where multiple users may log into the system and
query concurrently, should we assign a new IndexSearcher object for every
user ? I tried sharing the IndexSearcher object but noticed that the search
criteria and filters of one user gets carried over to another ? Or is there
some way to get over that ?
- Combining the above two, if the caches are per IndexSearcher objects, and
if we have to assign a new IndexSearcher for every new user (in a web
application), will the total cache size not explode ?

Apologies if these seem really basic. Thank you.

Regards
Rahul


Re: How real-time are Soir/Lucene queries?

2010-05-21 Thread Ben Eliott

You may wish to look at  Lucandra: http://github.com/tjake/Lucandra

On 21 May 2010, at 06:12, Walter Underwood wrote:

Solr is a very good engine, but it is not real-time. You can turn  
off the caches and reduce the delays, but it is fundamentally not  
real-time.


I work at MarkLogic, and we have a real-time transactional search  
engine (and respository). If you are curious, contact me directly.


I do like Solr for lots of applications -- I chose it when I was at  
Netflix.


wunder

On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote:


Hello Soir,

Soir looks like an excellent API and its nice to have a tutorial  
that makes it easy to discover the basics of what Soir does, I'm  
impressed. I can see plenty of potential uses of Soir/Lucene and  
I'm interested now in just how real-time the queries made to an  
index can be?


For example, in my application I have time ordered data being  
processed by a paint method in real-time. Each piece of data is  
identified and its associated renderer is invoked. The Java2D  
renderer would then lookup any layout and style values it requires  
to render the current data it has received from the layout and  
style indexes. What I'm wondering is if this lookup which would be  
a Lucene search will be fast enough?


Would it be best to make Lucene queries for the relevant layout and  
style values required by the renderers ahead of rendering time and  
have the query results placed into the most performant collection  
(map/array) so renderer lookup would be as fast as possible? Or can  
Lucene handle many individual lookup queries fast enough so  
rendering is quick?


Best regards from Canada,

Thom











Re: How real-time are Soir/Lucene queries?

2010-05-21 Thread Ben Eliott
Further to earlier note re Lucandra.  I note that Cassandra, which  
Lucandra backs onto,  is 'eventually consistent',  so given your real- 
time requirements,  you may want to review this in the first instance,  
if Lucandra is of interest.


On 21 May 2010, at 06:12, Walter Underwood wrote:

Solr is a very good engine, but it is not real-time. You can turn  
off the caches and reduce the delays, but it is fundamentally not  
real-time.


I work at MarkLogic, and we have a real-time transactional search  
engine (and respository). If you are curious, contact me directly.


I do like Solr for lots of applications -- I chose it when I was at  
Netflix.


wunder

On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote:


Hello Soir,

Soir looks like an excellent API and its nice to have a tutorial  
that makes it easy to discover the basics of what Soir does, I'm  
impressed. I can see plenty of potential uses of Soir/Lucene and  
I'm interested now in just how real-time the queries made to an  
index can be?


For example, in my application I have time ordered data being  
processed by a paint method in real-time. Each piece of data is  
identified and its associated renderer is invoked. The Java2D  
renderer would then lookup any layout and style values it requires  
to render the current data it has received from the layout and  
style indexes. What I'm wondering is if this lookup which would be  
a Lucene search will be fast enough?


Would it be best to make Lucene queries for the relevant layout and  
style values required by the renderers ahead of rendering time and  
have the query results placed into the most performant collection  
(map/array) so renderer lookup would be as fast as possible? Or can  
Lucene handle many individual lookup queries fast enough so  
rendering is quick?


Best regards from Canada,

Thom











Re: How real-time are Solr/Lucene queries?

2010-05-21 Thread Thomas J. Buhr
Thanks for the new information. Its really great to see so many options for 
Lucene.

In my scenario there are the following pieces:

1 - A local Java client with an embedded Solr instance and its own local 
index/s.
2 - A remote server running Solr with index/s that are more like a repository 
that local clients query for extra goodies.
3 - The client is also a JXTA node so it can share indexes or documents too.
4 - There is no browser involved what so ever.

My music composing application is a local client that uses configurations which 
would become many different document types. A subset of these configurations 
will be bundled with the application and then many more would be made available 
via a server/s running Solr.

I would not expect the queries which would be made from within the local client 
to be returned in real-time. I would only expect such queries to be made in 
reasonable time and returned to the client. The client would have its local 
Lucene index system (embedded Solr using SolrJ) which would be updated with the 
results of the query made to the Solr instance running on the remote server.

Then the user on the client would issue queries to the local Lucene index/s to 
obtain results which are used to setup contexts for different aspects of the 
client. For example: an activated context for musical scales and rhythms used 
for creating musical notes, an activated context for rendering with layout and 
style information for different music symbol renderer types.

I'm not yet sure but it may be best to make queries against the local Lucene 
index/s and then convert the results into some context objects, maybe an array 
or map (I'd like to learn more about how query results can be returned as 
arrays or maps as well). Then the tools and renderers which require the 
information in the contexts would do any real-time lookup directly from the 
context objects not the local or remote Lucene or Solr index/s. The local 
client is also a JXTA node so it can share its own index/s with fellow peers.

This is how I envision this happening with my limited knowledge of Lucene/Solr 
at this time. What are your thoughts on the feasibility of such a scenario?

I'm just reading through the Solr reference PDF now and looking over the Solr 
admin application. Looking at the Schema.xml it seems to be field not document 
oriented. From my point of view I think in terms of configuration types which 
would be documents. In the schema it seems like only fields are defined and it 
does not matter which configuration/document they belong to? I guess this is 
fine as long as the indexing takes into account my unique document types and I 
can search for them as a whole as well, not only for specific values across a 
set of indexed documents. 

Also, does the schema allow me to index certain documents into specific indexes 
or are they all just bunched together? I'd rather have unique indexes for 
specific document types. I've just read about multiple cores running under one 
Solr instance, is this the only way to support multiple indexes?

I'm thinking of ordering the Lucene in Action v2 book which is due this month 
and also the Solr 1.4 book. Before I do I just need to understand a few things 
which is why I'm writing such a long message :-)

Thom


On 2010-05-21, at 2:12 AM, Ben Eliott wrote:

 Further to earlier note re Lucandra.  I note that Cassandra, which Lucandra 
 backs onto,  is 'eventually consistent',  so given your real-time 
 requirements,  you may want to review this in the first instance, if Lucandra 
 is of interest.
 
 On 21 May 2010, at 06:12, Walter Underwood wrote:
 
 Solr is a very good engine, but it is not real-time. You can turn off the 
 caches and reduce the delays, but it is fundamentally not real-time.
 
 I work at MarkLogic, and we have a real-time transactional search engine 
 (and respository). If you are curious, contact me directly.
 
 I do like Solr for lots of applications -- I chose it when I was at Netflix.
 
 wunder
 
 On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote:
 
 Hello Soir,
 
 Soir looks like an excellent API and its nice to have a tutorial that makes 
 it easy to discover the basics of what Soir does, I'm impressed. I can see 
 plenty of potential uses of Soir/Lucene and I'm interested now in just how 
 real-time the queries made to an index can be?
 
 For example, in my application I have time ordered data being processed by 
 a paint method in real-time. Each piece of data is identified and its 
 associated renderer is invoked. The Java2D renderer would then lookup any 
 layout and style values it requires to render the current data it has 
 received from the layout and style indexes. What I'm wondering is if this 
 lookup which would be a Lucene search will be fast enough?
 
 Would it be best to make Lucene queries for the relevant layout and style 
 values required by the renderers ahead of rendering time and have the query 
 results placed into the most 

Re: IndexSearcher and Caches

2010-05-21 Thread MitchK

Rahul,

the IndexSearcher of Solr gets shared with every request within two commits.
That means one IndexSearcher + its caches got a lifetime of one commit.
After every commit, there will be a new one created.

The cache does not mean, that they are applied automatically. They mean,
that a filter from a query will be cached and whenever an user-query
requieres the same filtering-criteria, they will use the cached filter
instead of creating a new one on the fly. 

I.e: fq=inStock:true
The result of this filtering-criteria gets cached one time. If another user
asks again for a query with fq=inStock:true, Solr reuses the already
existing filter. 
Since such filters are cached as byteVectors, they are not large. 
In this case it does not care for what the user is querying in his q-param. 

BTW: The IndexSearcher is threadsafe. So there is no problem with concurrent
usage.

Hope this helps???

Kind regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/IndexSearcher-and-Caches-tp833567p833841.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 1.4 Enterprise Search Server book examples

2010-05-21 Thread Stefan Moises

Hi,

everybody who owns the book can now download the source code examples 
again, the zip file is fixed now - just got a message from Packt! :)

https://www.packtpub.com/support?nid=4191

Have fun :)
Cheers,
Stefan

Am 06.05.2010 16:15, schrieb Antonello Mangone:

I had the same problem and I send a message ... I'm waiting for an answer :D

2010/5/6 Stefan Moisesmoi...@shoptimax.de

   

Hi fellow Solr users,

I have contacted Packt regarding this issue and they are already working on
fixing it, here is the friendly response I've received:

Dear Stefan,

Thank you for writing to PacktPub.com.

I'm sorry to know that you were not able to access the code files for our
title.

Our IT team is now investigating this issue which they feel is due to the
heavy file size. They have also taken author's help for this

The author has split the example code file into parts due to its massive
size. The author is in the process to upload a few record files on a
different site (Musicbrainz) so  they have to spent some time on getting
permissions from them too. We are currently just waiting for him to send the
remaining part of the code to us which will be made available on our website
shortly.

Please accept our apologies for the trouble.

With warm regards

For Packt Publishing

Verus Pereira
Sales Executive


I'll let you all know once they get back to me that the files are updated.

Cheers,
Stefan

Am 27.04.2010 12:00, schrieb findbestopensource:

 

I downloaded the 5883_Code.zip file but not able to extract the complete
contents.

Regards
Aditya
www.findbestopensource.com



On Tue, Apr 27, 2010 at 12:45 AM, Johan Cwiklinski
johan.cwiklin...@ajlsm.com   wrote:



   

Hello,

Le 26/04/2010 20:53, findbestopensource a écrit :


 

I am able to successfully download the code. It is of 360 MB and took
lot


   

of


 

time to download.


   

I'm also able to download the file ; but not to extract many of the
files it contains after download (can list them but not extract, an
error occurs).

Are you able to extract the ZIP archive you've downloaded?




 

https://www.packtpub.com/solr-1-4-enterprise-search-server/book
Select the download the code link and provide your email id, Download


   

link


 

will be sent via email.

Regards
Aditya
www.findbestopensource.com



On Mon, Apr 26, 2010 at 8:34 PM, Abdelhamid ABIDaeh.a...@gmail.com


   

wrote:


 


   

Hi,
I'm also interested to get those examples, would someone to share them
?

On 4/26/10, markus.rietz...@rzf.fin-nrw.de


 

markus.rietz...@rzf.fin-nrw.de
   


 
   
   

wrote:
  


 

i have send you a private mail.

markus



   

-Ursprüngliche Nachricht-
Von: Johan Cwiklinski [mailto:johan.cwiklin...@ajlsm.com]
Gesendet: Montag, 26. April 2010 10:58
An: solr-user@lucene.apache.org
Betreff: Solr 1.4 Enterprise Search Server book examples

Hello,

We've recently acquired the Solr 1.4 Enterprise Search Server book.

I've tried to download the example ZIP file from the editor's
website,
but the file is actually corrupted, and I cannot unzip it :(

Could someone tell me if I can get these examples from
another location?

I've send a message last week to the editor reporting the issue, but
that is not yet fixed ; and I'd really like to take a look at the
example code and make some tests.

Regards,
--
Johan Cwiklinski



 


   


--
Abdelhamid ABID
Software Engineer- J2EE / WEB



 


   

--
Johan Cwiklinski



 


   

--
***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***


 
   


--
***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***



Re: Solr 1.4 Enterprise Search Server book examples

2010-05-21 Thread Johan Cwiklinski
Hello,

Le 21/05/2010 13:29, Stefan Moises a écrit :
 Hi,
 
 everybody who owns the book can now download the source code examples
 again, the zip file is fixed now - just got a message from Packt! :)
 https://www.packtpub.com/support?nid=4191
 
 Have fun :)
 Cheers,
 Stefan

I've received the same message today ; finally, I'll can take a look at
those examples :)

Regards,
-- 
Johan Cwiklinski
AJLSM


Wildcard queries

2010-05-21 Thread Sascha Szott

Hi folks,

what's the idea behind the fact that no text analysis (e.g. lowercasing) 
is performed on wildcarded search terms?


In my context this behaviour seems to be counter-intuitive (I guess 
that's the case in the majority of applications) and my application 
needs to lowercase any input term before sending the HTTP request to my 
Solr server.


Would it be easy to disable this behaviour in Solr (1.5)? I would like 
to see a config parameter (per field type) that allows to disable this 
odd behaviour if needed. To ensure backward compatibility the odd 
behaviour should be the default anymore.


Am I missing any drawbacks?

Best,
Sascha



Re: Wildcard queries

2010-05-21 Thread Robert Muir
we can use stemming as an example:

lets say your query is c?ns?st?nt?y

how will this match consistently, which the porter stemmer
transforms to 'consistent'.
furthermore, note that i replaced the vowels with ?'s here. The porter
stemmer doesnt just rip stuff off the end, but attempts to guess
syllables as part of the process, so it cannot possibly work.

the only way it would work in this situation would be if you formed
permutations of all the possible words this wildcard would match, and
then did analysis on each form, and searched on all stems.

but, this is impossible, since the * operator allows an infinite language.

On Fri, May 21, 2010 at 10:11 AM, Sascha Szott sz...@zib.de wrote:
 Hi folks,

 what's the idea behind the fact that no text analysis (e.g. lowercasing) is
 performed on wildcarded search terms?

 In my context this behaviour seems to be counter-intuitive (I guess that's
 the case in the majority of applications) and my application needs to
 lowercase any input term before sending the HTTP request to my Solr server.

 Would it be easy to disable this behaviour in Solr (1.5)? I would like to
 see a config parameter (per field type) that allows to disable this odd
 behaviour if needed. To ensure backward compatibility the odd behaviour
 should be the default anymore.

 Am I missing any drawbacks?

 Best,
 Sascha





-- 
Robert Muir
rcm...@gmail.com


Re: Wildcard queries

2010-05-21 Thread Smiley, David W.
I absolutely consider this a bug too.  Cast your vote:
https://issues.apache.org/jira/browse/SOLR-219

~ David

On May 21, 2010, at 10:11 AM, Sascha Szott wrote:

 Hi folks,
 
 what's the idea behind the fact that no text analysis (e.g. lowercasing) 
 is performed on wildcarded search terms?
 
 In my context this behaviour seems to be counter-intuitive (I guess 
 that's the case in the majority of applications) and my application 
 needs to lowercase any input term before sending the HTTP request to my 
 Solr server.
 
 Would it be easy to disable this behaviour in Solr (1.5)? I would like 
 to see a config parameter (per field type) that allows to disable this 
 odd behaviour if needed. To ensure backward compatibility the odd 
 behaviour should be the default anymore.
 
 Am I missing any drawbacks?
 
 Best,
 Sascha
 



Re: Wildcard queries

2010-05-21 Thread Sascha Szott

Hi Robert,

thanks, you're absolutely right. I should better refine my initial 
question to: What's the idea behind the fact that no *lowercasing* is 
performed on wildcarded search terms if the field in question contains a 
LowercaseFilter in its associated field type definition?


-Sascha

Robert Muir wrote:

we can use stemming as an example:

lets say your query is c?ns?st?nt?y

how will this match consistently, which the porter stemmer
transforms to 'consistent'.
furthermore, note that i replaced the vowels with ?'s here. The porter
stemmer doesnt just rip stuff off the end, but attempts to guess
syllables as part of the process, so it cannot possibly work.

the only way it would work in this situation would be if you formed
permutations of all the possible words this wildcard would match, and
then did analysis on each form, and searched on all stems.

but, this is impossible, since the * operator allows an infinite language.

On Fri, May 21, 2010 at 10:11 AM, Sascha Szottsz...@zib.de  wrote:

Hi folks,

what's the idea behind the fact that no text analysis (e.g. lowercasing) is
performed on wildcarded search terms?

In my context this behaviour seems to be counter-intuitive (I guess that's
the case in the majority of applications) and my application needs to
lowercase any input term before sending the HTTP request to my Solr server.

Would it be easy to disable this behaviour in Solr (1.5)? I would like to
see a config parameter (per field type) that allows to disable this odd
behaviour if needed. To ensure backward compatibility the odd behaviour
should be the default anymore.

Am I missing any drawbacks?

Best,
Sascha






Re: Wildcard queries

2010-05-21 Thread Robert Muir
this lowercasing can 'sort of work' (depending on your analysis, and
even language, not all case folding is as simple as english).

But the more general problem cannot be a bug, as its mathematically
not possible to do with queries like wildcard that allow an infinite
language, and non-reversible analysis.

On Fri, May 21, 2010 at 10:21 AM, Smiley, David W. dsmi...@mitre.org wrote:
 I absolutely consider this a bug too.  Cast your vote:
 https://issues.apache.org/jira/browse/SOLR-219

 ~ David

 On May 21, 2010, at 10:11 AM, Sascha Szott wrote:

 Hi folks,

 what's the idea behind the fact that no text analysis (e.g. lowercasing)
 is performed on wildcarded search terms?

 In my context this behaviour seems to be counter-intuitive (I guess
 that's the case in the majority of applications) and my application
 needs to lowercase any input term before sending the HTTP request to my
 Solr server.

 Would it be easy to disable this behaviour in Solr (1.5)? I would like
 to see a config parameter (per field type) that allows to disable this
 odd behaviour if needed. To ensure backward compatibility the odd
 behaviour should be the default anymore.

 Am I missing any drawbacks?

 Best,
 Sascha






-- 
Robert Muir
rcm...@gmail.com


Re: Wildcard queries

2010-05-21 Thread Robert Muir
I honestly do not know the rationale behind this in Solr, except to
say similar problems exist even if you reduce the scope to just
casing:

For example, if you are using a german stemmer, it will case-fold ß to
'ss' (such that it will match SS).

So doing some lowercasing at query-time will not correct the situation
for that character, and furthermore it will be inconsistent with the
'?' operator... (which only matches one character)

On Fri, May 21, 2010 at 10:28 AM, Sascha Szott sz...@zib.de wrote:
 Hi Robert,

 thanks, you're absolutely right. I should better refine my initial question
 to: What's the idea behind the fact that no *lowercasing* is performed on
 wildcarded search terms if the field in question contains a LowercaseFilter
 in its associated field type definition?

 -Sascha

 Robert Muir wrote:

 we can use stemming as an example:

 lets say your query is c?ns?st?nt?y

 how will this match consistently, which the porter stemmer
 transforms to 'consistent'.
 furthermore, note that i replaced the vowels with ?'s here. The porter
 stemmer doesnt just rip stuff off the end, but attempts to guess
 syllables as part of the process, so it cannot possibly work.

 the only way it would work in this situation would be if you formed
 permutations of all the possible words this wildcard would match, and
 then did analysis on each form, and searched on all stems.

 but, this is impossible, since the * operator allows an infinite language.

 On Fri, May 21, 2010 at 10:11 AM, Sascha Szottsz...@zib.de  wrote:

 Hi folks,

 what's the idea behind the fact that no text analysis (e.g. lowercasing)
 is
 performed on wildcarded search terms?

 In my context this behaviour seems to be counter-intuitive (I guess
 that's
 the case in the majority of applications) and my application needs to
 lowercase any input term before sending the HTTP request to my Solr
 server.

 Would it be easy to disable this behaviour in Solr (1.5)? I would like to
 see a config parameter (per field type) that allows to disable this odd
 behaviour if needed. To ensure backward compatibility the odd behaviour
 should be the default anymore.

 Am I missing any drawbacks?

 Best,
 Sascha







-- 
Robert Muir
rcm...@gmail.com


Re: Wildcard queries

2010-05-21 Thread Smiley, David W.
On May 21, 2010, at 10:35 AM, Robert Muir wrote:

 I honestly do not know the rationale behind this in Solr, except to
 say similar problems exist even if you reduce the scope to just
 casing:

Then why are you talking about stemming in the following example?  We know 
stemming is problematic with wildcard searching.  But casing... I argue not.

 For example, if you are using a german stemmer, it will case-fold ß to
 'ss' (such that it will match SS).
 
 So doing some lowercasing at query-time will not correct the situation
 for that character, and furthermore it will be inconsistent with the
 '?' operator... (which only matches one character)
 
 On Fri, May 21, 2010 at 10:28 AM, Sascha Szott sz...@zib.de wrote:
 Hi Robert,
 
 thanks, you're absolutely right. I should better refine my initial question
 to: What's the idea behind the fact that no *lowercasing* is performed on
 wildcarded search terms if the field in question contains a LowercaseFilter
 in its associated field type definition?
 
 -Sascha
 
 Robert Muir wrote:
 
 we can use stemming as an example:
 
 lets say your query is c?ns?st?nt?y
 
 how will this match consistently, which the porter stemmer
 transforms to 'consistent'.
 furthermore, note that i replaced the vowels with ?'s here. The porter
 stemmer doesnt just rip stuff off the end, but attempts to guess
 syllables as part of the process, so it cannot possibly work.
 
 the only way it would work in this situation would be if you formed
 permutations of all the possible words this wildcard would match, and
 then did analysis on each form, and searched on all stems.
 
 but, this is impossible, since the * operator allows an infinite language.
 
 On Fri, May 21, 2010 at 10:11 AM, Sascha Szottsz...@zib.de  wrote:
 
 Hi folks,
 
 what's the idea behind the fact that no text analysis (e.g. lowercasing)
 is
 performed on wildcarded search terms?
 
 In my context this behaviour seems to be counter-intuitive (I guess
 that's
 the case in the majority of applications) and my application needs to
 lowercase any input term before sending the HTTP request to my Solr
 server.
 
 Would it be easy to disable this behaviour in Solr (1.5)? I would like to
 see a config parameter (per field type) that allows to disable this odd
 behaviour if needed. To ensure backward compatibility the odd behaviour
 should be the default anymore.
 
 Am I missing any drawbacks?
 
 Best,
 Sascha
 
 
 
 
 
 
 
 -- 
 Robert Muir
 rcm...@gmail.com



Re: Wildcard queries

2010-05-21 Thread Robert Muir
On Fri, May 21, 2010 at 10:40 AM, Smiley, David W. dsmi...@mitre.org wrote:

 Then why are you talking about stemming in the following example?  We know 
 stemming is problematic with wildcard searching.  But casing... I argue not.


I just mentioned an example stemmer that properly case-folds this
german character.

Another tokenstream that does, is the unicode case-folding algorithm
[requires code dependent on ICU at the moment]

LowerCaseFilter is *not* unicode-compliant as far as casing goes.
toLowerCase is intended for display, not for case-insensitive
matching.


-- 
Robert Muir
rcm...@gmail.com


Re: How real-time are Soir/Lucene queries?

2010-05-21 Thread Dennis Gearon
Did your successor choose Solr? I seem to have read an article or seen a 
'mobcast' whre the Search Engine Guy (SEG) @ Netflix used Solr. (Or, maybe ite 
as another video chain)

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Thu, 5/20/10, Walter Underwood wun...@wunderwood.org wrote:

 From: Walter Underwood wun...@wunderwood.org
 Subject: Re: How real-time are Soir/Lucene queries?
 To: solr-user@lucene.apache.org
 Date: Thursday, May 20, 2010, 10:12 PM
 Solr is a very good engine, but it is
 not real-time. You can turn off the caches and reduce the
 delays, but it is fundamentally not real-time.
 
 I work at MarkLogic, and we have a real-time transactional
 search engine (and respository). If you are curious, contact
 me directly.
 
 I do like Solr for lots of applications -- I chose it when
 I was at Netflix.
 
 wunder
 
 On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote:
 
  Hello Soir,
  
  Soir looks like an excellent API and its nice to have
 a tutorial that makes it easy to discover the basics of what
 Soir does, I'm impressed. I can see plenty of potential uses
 of Soir/Lucene and I'm interested now in just how real-time
 the queries made to an index can be?
  
  For example, in my application I have time ordered
 data being processed by a paint method in real-time. Each
 piece of data is identified and its associated renderer is
 invoked. The Java2D renderer would then lookup any layout
 and style values it requires to render the current data it
 has received from the layout and style indexes. What I'm
 wondering is if this lookup which would be a Lucene search
 will be fast enough?
  
  Would it be best to make Lucene queries for the
 relevant layout and style values required by the renderers
 ahead of rendering time and have the query results placed
 into the most performant collection (map/array) so renderer
 lookup would be as fast as possible? Or can Lucene handle
 many individual lookup queries fast enough so rendering is
 quick?
  
  Best regards from Canada,
  
  Thom
  
 
 
 
 
 
 


Re: How real-time are Soir/Lucene queries?

2010-05-21 Thread Walter Underwood
I chose it, and it doesn't look like they've replaced it in the eight months 
since I left.

At the time, I was the entire search engineering department, so it was me.

wunder

On May 21, 2010, at 8:49 AM, Dennis Gearon wrote:

 Did your successor choose Solr? I seem to have read an article or seen a 
 'mobcast' whre the Search Engine Guy (SEG) @ Netflix used Solr. (Or, maybe 
 ite as another video chain)
 
 Dennis Gearon
 
 Signature Warning
 
 EARTH has a Right To Life,
  otherwise we all die.
 
 Read 'Hot, Flat, and Crowded'
 Laugh at http://www.yert.com/film.php
 
 
 --- On Thu, 5/20/10, Walter Underwood wun...@wunderwood.org wrote:
 
 From: Walter Underwood wun...@wunderwood.org
 Subject: Re: How real-time are Soir/Lucene queries?
 To: solr-user@lucene.apache.org
 Date: Thursday, May 20, 2010, 10:12 PM
 Solr is a very good engine, but it is
 not real-time. You can turn off the caches and reduce the
 delays, but it is fundamentally not real-time.
 
 I work at MarkLogic, and we have a real-time transactional
 search engine (and respository). If you are curious, contact
 me directly.
 
 I do like Solr for lots of applications -- I chose it when
 I was at Netflix.
 
 wunder
 
 On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote:
 
 Hello Soir,
 
 Soir looks like an excellent API and its nice to have
 a tutorial that makes it easy to discover the basics of what
 Soir does, I'm impressed. I can see plenty of potential uses
 of Soir/Lucene and I'm interested now in just how real-time
 the queries made to an index can be?
 
 For example, in my application I have time ordered
 data being processed by a paint method in real-time. Each
 piece of data is identified and its associated renderer is
 invoked. The Java2D renderer would then lookup any layout
 and style values it requires to render the current data it
 has received from the layout and style indexes. What I'm
 wondering is if this lookup which would be a Lucene search
 will be fast enough?
 
 Would it be best to make Lucene queries for the
 relevant layout and style values required by the renderers
 ahead of rendering time and have the query results placed
 into the most performant collection (map/array) so renderer
 lookup would be as fast as possible? Or can Lucene handle
 many individual lookup queries fast enough so rendering is
 quick?
 
 Best regards from Canada,
 
 Thom
 
 
 
 
 
 
 

--
Walter Underwood
Venture ASM, Troop 14, Palo Alto





Re: Personalized Search

2010-05-21 Thread Rih
It will likely be what you suggested, one or two multi value fields. But
with 10,000+ members, does Solr scaled with this schema?


On Thu, May 20, 2010 at 6:27 PM, findbestopensource 
findbestopensou...@gmail.com wrote:

 Hi Rih,

 You going to include either of the two field bought or like to per
 member/visitor OR a unique field per member / visitor?

 If it's one or two common fields are included then there will not be any
 impact in performance. If you want to include unique field then you need to
 consider multi value field otherwise you certainly hit the wall.

 Regards
 Aditya
 www.findbestopensource.com




 On Thu, May 20, 2010 at 12:13 PM, Rih tanrihae...@gmail.com wrote:

  Has anybody done personalized search with Solr? I'm thinking of including
  fields such as bought or like per member/visitor via dynamic fields
 to
  a
  product search schema. Another option is to have a multi-value field that
  can contain user IDs. What are the possible performance issues with this
  setup?
 
  Looking forward to your ideas.
 
  Rih
 



Re: Personalized Search

2010-05-21 Thread Rih
Well, it's not really a recommendation engine per se but more of a filter
for the user. Say, I already own some stuff from the result set, I just want
to exclude them from the results. What I'm concerned with is reindexing the
document everytime someone marks/votes/likes/boughts.


On Thu, May 20, 2010 at 11:04 PM, Ken Krugler
kkrugler_li...@transpac.comwrote:


 On May 19, 2010, at 11:43pm, Rih wrote:

  Has anybody done personalized search with Solr? I'm thinking of including
 fields such as bought or like per member/visitor via dynamic fields to
 a
 product search schema. Another option is to have a multi-value field that
 can contain user IDs. What are the possible performance issues with this
 setup?


 Mitch is right, what you're looking for here is a recommendation engine, if
 I understand your question properly.

 And yes, Mahout should work though the Taste recommendation engine it
 supports is pretty new. But Sean Owen  Robin Anil have a Mahout in Action
 book that's in early release via Manning, and it has lots of good
 information about Mahout  recommender systems.

 Assuming you have a list of recommendations for a given user, based on
 their past behavior and the recommendation engine, then you could use this
 to adjust search results. I'm waiting for Hoss to jump in here on how best
 to handle that :)

 -- Ken

 
 Ken Krugler
 +1 530-210-6378
 http://bixolabs.com
 e l a s t i c   w e b   m i n i n g







Re: Personalized Search

2010-05-21 Thread Geert-Jan Brits
Just want to throw this in: If you're worried about scaling, etc. you could
take a look at item-based collaborative filtering instead of user based.
i.e:
DO NIGHTLY/ BATCH:
- calculate the similarity between items based on their properties

DO ON EACH REQUEST
- have a user store/update it's interest as a vector of item-properties. How
to update this based on click / browse behavior is the interesting thing and
depends a lot on your environment.
- Next is to recommend 'neighboring' items that are close to the defined
'interest-vector'.

The code is similar to user-based colab. filtering, but scaling is invariant
to the nr of users.

other advantages:
- new items/ products can be recommended as soon as they are added to the
catalog (no need for users to express interest in them before the item can
be suggested)

disadvantage:
- top-N results tend to be less dynamic then when using user-based colab.
filtering.

Of course, this doesn't touch on how to integrate this with Solr. Perhaps
some combination with Mahout is indeed the best solution. I haven't given
this much thought yet I must say.
For info on Mahout Taste (+ an explanation on item-based filtering vs.
user-based filtering) see:
http://lucene.apache.org/mahout/taste.html

Cheers,
Geert-Jan

2010/5/21 Rih tanrihae...@gmail.com

 
  - keep the SOLR index independent of bought/like

 - have a db table with user prefs on a per item basis


 I have the same idea this far.

 at query time, specify boosts for 'my items' items


 I believe this works if you want to sort results by faved/not faved. But
 how
 does it scale if users already favorited/liked hundreds of item? The query
 can be quite long.

 Looking forward to your idea.



 On Thu, May 20, 2010 at 6:37 PM, dc tech dctech1...@gmail.com wrote:

  Another approach would be to do query time boosts of 'my' items under
  the assumption that count is limited:
  - keep the SOLR index independent of bought/like
  - have a db table with user prefs on a per item basis
  - at query time, specify boosts for 'my items' items
 
  We are planning to do this in the context of document management where
  documents in 'my (used/favorited ) folders' provide a boost factor
  to the results.
 
 
 
  On 5/20/10, findbestopensource findbestopensou...@gmail.com wrote:
   Hi Rih,
  
   You going to include either of the two field bought or like to per
   member/visitor OR a unique field per member / visitor?
  
   If it's one or two common fields are included then there will not be
 any
   impact in performance. If you want to include unique field then you
 need
  to
   consider multi value field otherwise you certainly hit the wall.
  
   Regards
   Aditya
   www.findbestopensource.com
  
  
  
  
   On Thu, May 20, 2010 at 12:13 PM, Rih tanrihae...@gmail.com wrote:
  
   Has anybody done personalized search with Solr? I'm thinking of
  including
   fields such as bought or like per member/visitor via dynamic
 fields
  to
   a
   product search schema. Another option is to have a multi-value field
  that
   can contain user IDs. What are the possible performance issues with
 this
   setup?
  
   Looking forward to your ideas.
  
   Rih
  
  
 
  --
  Sent from my mobile device
 



Re: Which Solr to use?

2010-05-21 Thread Jim Blomo
On Tue, May 18, 2010 at 12:31 PM, Sixten Otto six...@sfko.com wrote:
 So features are being actively added to / code rearranged in
 trunk/4.0, with some of the work being back-ported to this branch to
 form a stable 3.1 release? Is that accurate?

 Is there any thinking about when that might drop (beyond the quite
 understandable when it's done)? Or, perhaps more reasonably, when it
 might freeze?

I'm also interested in the recommend testing branch (to borrow a
Debian term) to use.  I'm planning a deployment in 2 months or so and
have been experiencing too many problems with the older version of
Tika to use the 1.4 version.

Jim


Re: Which Solr to use?

2010-05-21 Thread Chris Hostetter

:  Is there any thinking about when that might drop (beyond the quite
:  understandable when it's done)? Or, perhaps more reasonably, when it
:  might freeze?

FWIW: I have no idea ... it's all a question of when someone takes 
charge on the release process -- quite frankly, so much is in flux right 
now (because of the java+solr code tree merges, *and* the decision to 
create parallel dev branches so the trunk could be more agressive about 
API changes, *and* the decision to refactor modules) that i suspect a lot 
of things kind of need to shake out before anyone is going to feel 
comfortable doing a new release

: I'm also interested in the recommend testing branch (to borrow a
: Debian term) to use.  I'm planning a deployment in 2 months or so and
: have been experiencing too many problems with the older version of
: Tika to use the 1.4 version.

FWIW: If the only problem you are having with 1.4 is that you want o 
upgrade Tika, patching 1.4 to make hte neccessary changes to use a new 
version of Tika is probably going to be far less invasive/risky then using 
the 3x branch (but that is only my opinion, and i'm not even that well 
informed about what it takes to upgrade the Tika dependency)



-Hoss



Re: Special Circumstances for embedded Solr

2010-05-21 Thread Ryan McKinley

 Any other commonly compelling reasons to use SolrJ?

The most compelling reason (I think) is that if you program against
the Solrj API, you can switch between embedded/http/streaming
implementations without changing anything.

This is great for our app that is either run as a small local instance
of a big enterprise setting.

ryan


Re: Moving from Lucene to Solr?

2010-05-21 Thread Ryan McKinley
On Wed, May 19, 2010 at 6:38 AM, Peter Karich peat...@yahoo.de wrote:
 Hi all,

 while asking a question on stackoverflow [1] some other questions appear:
 Is SolrJ a recommended way to access Solr or should I prefer the HTTP
 interface?

solrj vs HTTP interface?  That will just be a matter of taste.  If you
are working in java, then solrj is likely a good option.



 How can I (j)unit-test Solr? (e.g. create+delete index via Java call)


If you want to mess with creating/removing indexes at runtime, see:
http://wiki.apache.org/solr/CoreAdmin


 Is Lucene faster than Solr? ... do you have experiences, preferable with
 the same index?

solr is built ontop of lucene, so in that regard it is the same speed.
 Depending on your app, the abstractions that solr makes may make it
less efficient then working directly in lucene.  Unless you have very
specialized needs, I doubt this will make a big difference.


DataImportHandler and running out of disk space

2010-05-21 Thread wojtekpia

I'm noticing some data differences between my database and Solr. About a week
ago my Solr server ran out of disk space, so now I'm observing how the
DataImportHandler behaves when Solr runs out of disk space. In a word, I'd
say it behaves badly! It looks like out-of-disk-space exceptions are treated
like any other document-level exception (so my updates report successful
completion). After running out of disk space I see the index shrink, then
updates continue until I run out of disk space again, then the index
shrinks, etc. I'm running a nightly build from December, 2009. Has this
behaviour has changed since then? Is there DIH configuration to fail on
certain exceptions?

Thanks,

Wojtek
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-and-running-out-of-disk-space-tp835125p835125.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH post import event listener for errors

2010-05-21 Thread Robert Zotter

I have a similar need so I've opened up a ticket:
http://issues.apache.org/jira/browse/SOLR-1922

Should be pretty trivial to add. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-post-import-event-listener-for-errors-tp834645p835132.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Personalized Search

2010-05-21 Thread dc tech
In our specific case, we would get the user's folders and then do a
function query that provides a boost if the document.folder is in {my
folder list}.

Another approach that will work for our intranet use is to add the
userids in a multi-valued field as others have suggested.



On 5/20/10, MitchK mitc...@web.de wrote:

 Hi dc,



 - at query time, specify boosts for 'my items' items

 Do you mean something like document-boost or do you want to include
 something like
 OR myItemId:100^100
 ?

 Can you tell us how you would specify document-boostings at query-time? Or
 are you querying something like a boolean field (i.e. isFavorite:true^10) or
 a numeric field?

 Kind regards
 - Mitch
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Personalized-Search-tp831070p832062.html
 Sent from the Solr - User mailing list archive at Nabble.com.


-- 
Sent from my mobile device


Re: Tipps for develop a own RequestHandler ?!

2010-05-21 Thread Chris Hostetter

: I would write an own RH for my system. is an howto in the www ? i didnt
: found anythin about it. 

I would start by looking at how existing RequestHandlers are implemented 
-- the ones that ship with Solr are heavily refactored to reuse a lot of 
functionality, which can sometimes make it hard to follow what's going on, 
but at he same time they help make it clear where there is functionality 
you can reuse.

My other big tip would be: rethink wether you really need to write 
RequestHandler.  once upon a time this was the main type of plugin for 
doing things at search time, but with the introduction of QParsers and 
SearchComponents there is now usually a lot easier ways to do things -- if 
you tell us what type of custom logic you want to write, folks might be 
able to point out a simpler way to implement it.

: can i develop in the svn-checkout and test in without building an new
: solr.war ? debug ?  

if you take a look at the JARs that are included in the Solr release, you 
can compile against them -- you also don't have to rebuild the WAR with 
your custom classes, you can put them in a Jar that is loaded at 
runtime...
  http://wiki.apache.org/solr/SolrPlugins

As for debugging Plugins: i tend to do it all via UnitTests and stack 
traces.  there are some base classes in Solr that make this fairly easy to 
setup (TestHarness, AbstractSolrTestCase, and the new JUNit4 test base 
class whose name escapes me at hte moment)

: i setting up solr in eclipse like this:
: 
http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-eclipse
: but so its not possible to develop, right ? 

can you clarify your question? ... if you followed those steps i'm not 
sure why you wouldn't be able to develope your plugin and test it in 
eclipse.


-Hoss



Re: Statistics exposed as JSON

2010-05-21 Thread Chris Hostetter

: Are the Solr 1.4 statistics like #docs, #docsPending etc. exposed in
: JSON format?


if you are refering to hte output from stats.jsp, then no -- that is not 
available in JSON format in Solr 1.4.

In future versions of Solr a new RequestHandle will replace stats.jsp 
(and regsitry.jsp) making this info available in all the formats supported 
by the ResponseWriters.


-Hoss



No hits returned from shard search on multi-core setup

2010-05-21 Thread TonyBray

I cannot get hits back and do not get a correct total number of records when
using shard searching.
I have 2 cores, core0 and core1.  Both have the same schema.xml and
solrconfig.xml (different datadirs in solrconfig.xml).
Our id field contains globally unique id's across both cores, but they use
the same id field (same schema.xml).
Issue exists when testing with Jetty and Tomcat.  Using Solr 1.4.1.
I found two other instances of this exact error on Google and neither have a
solution, just a description like mine with lots of responses.  Multi-core
searching is something we need due to data layout including multiple
languages.

Details: 

Folder layout:

C:\apache-solr-1.4.0\example\solr_multicore\solr\cores
core0\data
core0\conf

core1\data
core1\conf

solr.xml

My solr.xml:

?xml version=1.0 encoding=UTF-8 ? 
!-- snip the comments from Apache wiki -- 
solr persistent=true sharedLib=lib 
cores adminPath=/admin/cores  
core name=core0 instanceDir=core0 /
core name=core1 instanceDir=core1 / 
/cores
/solr

Core0 search:

http://localhost:8080/solr/core0/select/?q=*:*version=2.2start=0rows=10indent=on

results:

  ?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime0/int 
- lst name=params
  str name=rows10/str 
  str name=start0/str 
  str name=indenton/str 
  str name=q*:*/str 
  str name=version2.2/str 
  /lst
  /lst
- result name=response numFound=131 start=0
- doc
...

Core1 search:

http://localhost:8080/solr/core1/select/?q=*:*version=2.2start=0rows=10indent=on

results:

  ?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime16/int 
- lst name=params
  str name=rows10/str 
  str name=start0/str 
  str name=indenton/str 
  str name=q*:*/str 
  str name=version2.2/str 
  /lst
  /lst
- result name=response numFound=302 start=0
- doc
...

Shard'd search:

http://localhost:8080/solr/core0/select?q=*%:*version=2.2start=0rows=10indent=onshards=localhost:8080/solr/core0,localhost:8080/solr/core1

results:

  ?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime31/int 
- lst name=params
  str name=rows10/str 
  str name=start0/str 
  str name=indenton/str 
  str name=q*:*/str 
  str
name=shardslocalhost:8080/solr/core0,localhost:8983/solr/core1/str 
  str name=version2.2/str 
  /lst
  /lst
  result name=response numFound=423 start=0 / 
  /response
  
Notice no doc's.  numFound does not equal total for both cores
(131+302=433).  

Query info from Catalina.log:

May 21, 2010 12:27:32 PM org.apache.solr.core.SolrCore execute
INFO: [core0] webapp=/solr path=/select
params={wt=javabinisShard=truerows=10start=0fsv=trueq=*:*fl=sedocid,scoreversion=1}
hits=131 status=0 QTime=0 
May 21, 2010 12:27:32 PM org.apache.solr.core.SolrCore execute
INFO: [core1] webapp=/solr path=/select
params={wt=javabinisShard=truerows=10start=0fsv=trueq=*:*fl=sedocid,scoreversion=1}
hits=302 status=0 QTime=0 
May 21, 2010 12:27:32 PM org.apache.solr.core.SolrCore execute
INFO: [core0] webapp=/solr path=/select
params={wt=javabinisShard=truerows=10start=0ids=core0_IM10009_0,core0_IM10006_0,core0_IM10002_0,core0_IM10010_0,core0_IM10007_0,core0_IM10004_0,core0_IM10001_0,core0_IM10003_0,core0_IM10008_0,core0_IM10005_0q=*:*version=1}
status=0 QTime=0 
May 21, 2010 12:27:32 PM org.apache.solr.core.SolrCore execute
INFO: [core0] webapp=/solr path=/select
params={rows=10start=0indent=onq=*:*shards=localhost:8080/solr/core0,localhost:8080/solr/core1version=2.2}
status=0 QTime=172 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/No-hits-returned-from-shard-search-on-multi-core-setup-tp835169p835169.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Non-English query via Solr Example Admin corrupts text

2010-05-21 Thread Chris Hostetter

: I wanted to improve the documentation in the solr wiki by adding in my
: findings.  However, when I try to log in and create a new account, I
: receive this error message:
: 
: You are not allowed to do newaccount on this page. Login and try again.
: 
: Does anyone know how I can get permission to add a page to the
: documentation?

Hmmm... yes, there definitely seems to be a problem with creating new wiki 
accounts on wiki.apache.org -- i've opened an issue with INFRA...

   https://issues.apache.org/jira/browse/INFRA-2726




-Hoss



Re: Personalized Search

2010-05-21 Thread dc tech
Excluding favorited items is an easier problem
- get the results
- get exclude list from db
- scan results and exclude the items in the item list

You'd have to do some code to manage 'holes' in the result list ie
fetch more etc.

You could marry this with the solr batch based approach to reduce the holes :
- Every night, update the item.users field. This can be simple string
type of  field.
- query with negative criteria ie
   content:search_term AND -users:userid
- then do the steps outlined earlier

On 5/21/10, Rih tanrihae...@gmail.com wrote:

 - keep the SOLR index independent of bought/like

 - have a db table with user prefs on a per item basis


 I have the same idea this far.

 at query time, specify boosts for 'my items' items


 I believe this works if you want to sort results by faved/not faved. But how
 does it scale if users already favorited/liked hundreds of item? The query
 can be quite long.

 Looking forward to your idea.



 On Thu, May 20, 2010 at 6:37 PM, dc tech dctech1...@gmail.com wrote:

 Another approach would be to do query time boosts of 'my' items under
 the assumption that count is limited:
 - keep the SOLR index independent of bought/like
 - have a db table with user prefs on a per item basis
 - at query time, specify boosts for 'my items' items

 We are planning to do this in the context of document management where
 documents in 'my (used/favorited ) folders' provide a boost factor
 to the results.



 On 5/20/10, findbestopensource findbestopensou...@gmail.com wrote:
  Hi Rih,
 
  You going to include either of the two field bought or like to per
  member/visitor OR a unique field per member / visitor?
 
  If it's one or two common fields are included then there will not be any
  impact in performance. If you want to include unique field then you need
 to
  consider multi value field otherwise you certainly hit the wall.
 
  Regards
  Aditya
  www.findbestopensource.com
 
 
 
 
  On Thu, May 20, 2010 at 12:13 PM, Rih tanrihae...@gmail.com wrote:
 
  Has anybody done personalized search with Solr? I'm thinking of
 including
  fields such as bought or like per member/visitor via dynamic fields
 to
  a
  product search schema. Another option is to have a multi-value field
 that
  can contain user IDs. What are the possible performance issues with
  this
  setup?
 
  Looking forward to your ideas.
 
  Rih
 
 

 --
 Sent from my mobile device



-- 
Sent from my mobile device


Re: TikaEntityProcessor on Solr 1.4?

2010-05-21 Thread Sixten Otto
2010/5/19 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
 I guess it should work because Tika Entityprocessor does not use any
 new 1.4 APIs

 On Wed, May 19, 2010 at 1:17 AM, Sixten Otto six...@sfko.com wrote:
 The TikaEntityProcessor class that enables DataImportHandler to
 process business documents was added after the release of Solr 1.4,
 ... Has anyone tried back-porting those changes to Solr 1.4?

Did you mean new 1.5 APIs (since TEP was added *after* 1.4 was
released)? Even then, that doesn't make a lot of sense to me, as at
least a couple of new things (the binary data sources) *were* added to
support TikaEntityProcessor.

I'm sorry if I'm being dense, but I'm having trouble understanding this answer.

Sixten


RE: Non-English query via Solr Example Admin corrupts text

2010-05-21 Thread Chris Hostetter

This should be fixed now -- please update the Jira issue if you have any 
other problems creating an account.

: Hmmm... yes, there definitely seems to be a problem with creating new wiki 
: accounts on wiki.apache.org -- i've opened an issue with INFRA...
: 
:https://issues.apache.org/jira/browse/INFRA-2726



-Hoss



RE: seemingly impossible query

2010-05-21 Thread Nagelberg, Kallin
I just realized something that may make the fieldcollapsing strategy 
insufficient. My 'ids' field is multi-valued. From what I've read you cannot 
field collapse on a multi-valued field. Any other ideas?

Thanks,
-Kallin Nagelberg

-Original Message-
From: Geert-Jan Brits [mailto:gbr...@gmail.com] 
Sent: Thursday, May 20, 2010 1:03 PM
To: solr-user@lucene.apache.org
Subject: Re: seemingly impossible query

Hi Kallin,

again please look at
FieldCollapsinghttp://wiki.apache.org/solr/FieldCollapsing ,
that should do the trick.
basically: first you constrain the field: 'listOfIds' to only contain docs
that contain any of the (up to) 100 random ids as you know how to do

Next, in the same query, specify to collapse on field 'listOfIds '
basically:
q=listOfIds:1 OR listOfIds:10 OR listOfIds:24
collapse.threshold=1collapse.field=listOfIdscollapse.type=normal

this would return the top-matching doc for each id left in listOfIds. Since
you constrained this field by the ids specified you are left with 1 matching
doc for each id.

Again it is not guarenteed that all docs returned are different. Since you
didn't specify this as a requirement I think this will suffics.

Cheers,
Geert-Jan

2010/5/20 Nagelberg, Kallin knagelb...@globeandmail.com

 Yeah I need something like:
 (id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that..

 I'm not sure how I can hit solr once. If I do try and do them all in one
 big OR query then I'm probably not going to get a hit for each ID. I would
 need to request probably 1000 documents to find all 100 and even then
 there's no guarantee and no way of knowing how deep to go.

 -Kallin Nagelberg

 -Original Message-
 From: dar...@ontrenet.com [mailto:dar...@ontrenet.com]
 Sent: Thursday, May 20, 2010 12:27 PM
 To: solr-user@lucene.apache.org
 Subject: RE: seemingly impossible query

 I see. Well, now you're asking Solr to ignore its prime directive of
 returning hits that match a query. Hehe.

 I'm not sure if Solr has a unique attribute.

 But this sounds, to me, like you will have to filter the results yourself.
 But at least you hit Solr only once before doing so.

 Good luck!

  Thanks Darren,
 
  The problem with that is that it may not return one document per id,
 which
  is what I need.  IE, I could give 100 ids in that OR query and retrieve
  100 documents, all containing just 1 of the IDs.
 
  -Kallin Nagelberg
 
  -Original Message-
  From: dar...@ontrenet.com [mailto:dar...@ontrenet.com]
  Sent: Thursday, May 20, 2010 12:21 PM
  To: solr-user@lucene.apache.org
  Subject: Re: seemingly impossible query
 
  Ok. I think I understand. What's impossible about this?
 
  If you have a single field name called id that is multivalued
  then you can retrieved the documents with something like:
 
  id:1 OR id:2 OR id:56 ... id:100
 
  then add limit 100.
 
  There's probably a more succinct way to do this, but I'll leave that to
  the experts.
 
  If you also only want the documents within a certain time, then you also
  create a time field and use a conjunction (id:0 ...) AND time:NOW-1H
  or something similar to this. Check the query syntax wiki for specifics.
 
  Darren
 
 
  Hey everyone,
 
  I've recently been given a requirement that is giving me some trouble. I
  need to retrieve up to 100 documents, but I can't see a way to do it
  without making 100 different queries.
 
  My schema has a multi-valued field like 'listOfIds'. Each document has
  between 0 and N of these ids associated to them.
 
  My input is up to 100 of these ids at random, and I need to retrieve the
  most recent document for each id (N Ids as input, N docs returned). I'm
  currently planning on doing a single query for each id, requesting 1
  row,
  and caching the result. This could work OK since some of these ids
  should
  repeat quite often. Of course I would prefer to find a way to do this in
  Solr, but I'm not sure it's capable.
 
  Any ideas?
 
  Thanks,
  -Kallin Nagelberg
 
 
 




Full Import failed

2010-05-21 Thread Mohamed Parvez
I am getting this error, any hint as where i should look

SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoSuchMethodError: isEmpty
 at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:424)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
 at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
 at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
Caused by: java.lang.NoSuchMethodError: isEmpty
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:391)
... 5 more

P.S: I am using ClobTransformer and HTMLStripTransformer


Re: Full Import failed

2010-05-21 Thread Paul Libbrecht
Last I encountered that exception was with the usage of String.isEmpty  
which is a 1.6 novelty.

Can it be you've been running 1.5?

paul


Le 21-mai-10 à 22:44, Mohamed Parvez a écrit :


SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoSuchMethodError: isEmpty
at
org 
.apache 
.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:424)




smime.p7s
Description: S/MIME cryptographic signature


Re: TikaEntityProcessor on Solr 1.4?

2010-05-21 Thread Chris Harris
You are right that TikaEntityProcessor has a couple of other prereqs
beyond stock Solr 1.4. I think the main point is that they're
relatively minor. I've merged TikaEntityProcessor (and some prereqs)
and its dependencies into my Solr 1.4 tree and it compiles fine,
though I haven't yet tested that TikaEntityProcessor actually works in
my setup.

Actually, rather than cherry-pick just the changes from SOLR-1358 and
SOLR-1583 what I did was to merge in all DataImportHandler-related
changes from between the 1.4 release up through Solr trunk r890679
(inclusive). I'm not sure if that's what would work best for you, but
it's one option.

On Fri, May 21, 2010 at 1:28 PM, Sixten Otto six...@sfko.com wrote:
 2010/5/19 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
 I guess it should work because Tika Entityprocessor does not use any
 new 1.4 APIs

 On Wed, May 19, 2010 at 1:17 AM, Sixten Otto six...@sfko.com wrote:
 The TikaEntityProcessor class that enables DataImportHandler to
 process business documents was added after the release of Solr 1.4,
 ... Has anyone tried back-porting those changes to Solr 1.4?

 Did you mean new 1.5 APIs (since TEP was added *after* 1.4 was
 released)? Even then, that doesn't make a lot of sense to me, as at
 least a couple of new things (the binary data sources) *were* added to
 support TikaEntityProcessor.

 I'm sorry if I'm being dense, but I'm having trouble understanding this 
 answer.

 Sixten



Re: DataImportHandler and running out of disk space

2010-05-21 Thread wojtekpia

I ran through some more failure scenarios (scenarios and results below). The
concerning ones in my deployment are when data does not get updated, but the
DIH's .properties file does. I could only simulate that scenario when I ran
out of disk space (all all disk space issues behaved consistently). Is this
worthy of a JIRA issue?




Successful import

all dates updated in .properties (title date updated, each [entity
name].last_index_time updated to its own update time. last_index_time set to
earliest entity update time)




Running out of disk space during import (in data directory only, conf
directory still has space)

no data updated, but dataimport.properties updated as in 1




Running out of disk space during import (in both data directory and conf
directory)

some data updated, but dataimport.properties updated as in 1




Running out of disk space during commit/optimize (in data directory only,
conf directory still has space)

no data updated, but dataimport.properties updated as in 1




Running out of disk space during commit/optimize (in both data directory and
conf directory)

no data updated, but dataimport.properties updated as in 1




File permissions prevent writing (on index directory)

data not updated, failure reported, properties file not updated




File permissions prevent writing (on segment files)

data updated, failure reported, properties file not updated




File permissions prevent writing (on .properties file)

data updated, failure reported, properties file not updated




Shutting down Solr during import (killing process)

data not updated, .properties not updated, no result reported




Shutting down Solr during import (issuing shutdown message)

Some data updated, .properties not updated, no result reported




DB connection lost (unplugging network cable)

data not updated, .properties not updated, failure reported




Updating single entity fails (first one)

data not updated, .properties not updated, failure reported




Updating single entity fails (after another one succeeds)

data not updated, .properties not updated, failure reported





-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-and-running-out-of-disk-space-tp835125p835368.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Full Import failed

2010-05-21 Thread Mohamed Parvez
yes i am running 1.5, Any idea how we can run Solr 1.4 using Java 1.5

---
Thanks/Regards,
Parvez



On Fri, May 21, 2010 at 4:17 PM, Paul Libbrecht p...@activemath.org wrote:

 Last I encountered that exception was with the usage of String.isEmpty
 which is a 1.6 novelty.
 Can it be you've been running 1.5?

 paul


 Le 21-mai-10 à 22:44, Mohamed Parvez a écrit :


  SEVERE: Full Import failed
 org.apache.solr.handler.dataimport.DataImportHandlerException:
 java.lang.NoSuchMethodError: isEmpty
 at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:424)





Re: TikaEntityProcessor on Solr 1.4?

2010-05-21 Thread Sixten Otto
On Fri, May 21, 2010 at 5:30 PM, Chris Harris rygu...@gmail.com wrote:
 Actually, rather than cherry-pick just the changes from SOLR-1358 and
 SOLR-1583 what I did was to merge in all DataImportHandler-related
 changes from between the 1.4 release up through Solr trunk r890679
 (inclusive). I'm not sure if that's what would work best for you, but
 it's one option.

I'd rather, of course, not to have to build my own. But if I'm going
to dabble in the source at all, it's just a slippery slope from the
former to the latter. :-)  (My main hesitation in doing so would be
that I'm new enough to the code that I have no idea what core changes
the trunk's DIH might also depend on. And my Java's pretty rusty.)

How did you arrive at your patch? Just grafting the entire
trunk/solr/contrib/dataimporthandler onto 1.4's code? Or did you go
through Jira/SVN looking for applicable changesets?

I'll be very interested to hear how your testing goes!

Sixten


Re: Full Import failed

2010-05-21 Thread Paul Libbrecht

Fixing that precise line is very easy, and recompiling is easy as well.
But I am absolutely not sure this will be the only occurrence of a 1.6  
dependency.


paul


Le 21-mai-10 à 23:40, Mohamed Parvez a écrit :


yes i am running 1.5, Any idea how we can run Solr 1.4 using Java 1.5

---
Thanks/Regards,
Parvez



On Fri, May 21, 2010 at 4:17 PM, Paul Libbrecht  
p...@activemath.org wrote:


Last I encountered that exception was with the usage of  
String.isEmpty

which is a 1.6 novelty.
Can it be you've been running 1.5?

paul


Le 21-mai-10 à 22:44, Mohamed Parvez a écrit :


SEVERE: Full Import failed

org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoSuchMethodError: isEmpty
at

org 
.apache 
.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java: 
424)









smime.p7s
Description: S/MIME cryptographic signature


field collapsing on multi-valued field

2010-05-21 Thread Nagelberg, Kallin
As I understand from looking at 
https://issues.apache.org/jira/login.jsp?os_destination=/browse/SOLR-236 field 
collapsing has been disabled on multi-valued fields. Is this really necessary?

Let's say I have a multi-valued field, 'my-mv-field'. I have a query like 
(my-mv-field:1 OR my-mv-field:5) that returns docs with the following values 
for 'my-mv-field':

Doc1: 1, 2, 3,
Doc2: 1, 3
Doc3: 2, 4, 5, 6
Doc4: 1

If I collapse on that field with that query I imagine it should mean 'collect 
the docs, starting from the top, so that I find 1 and 5'. In this case if it 
returned Doc1 and Doc3 I would be happy.

There must be some ambiguity or implementation detail I am unaware that is 
preventing this. It may be a critical piece of functionality for an application 
I'm working on, so I'm curious if there is point in pursuing development of 
this functionality or if I am missing something.

Thanks,
Kallin Nagelberg


RE: Solr 1.4 Enterprise Search Server book examples

2010-05-21 Thread Robert Risley
I downloaded the examples and unzipped into
C:\Examples
C:\Examples\3
C:\Examples\7
C:\Examples\8
C:\Examples\9
C:\Examples\cores
C:\Examples\solr

Starting in the C:\Examples\solr folder run command 'java -jar start.jar' and 
it starts ok, but all the URI's return 404.
I can get Solr running with Tomcat quite easily but wanted to try their Jetty 
version.

Chapter 1 pages 15-18 just don't explain the OOTB install of the examples.

I have tried
java -jar start.jar
java -Dsolr.solr.home=c:/Examples/solr/ -jar start.jar
java -Dsolr.solr.home=c:\Examples\solr\ -jar start.jar
java -Dsolr.solr.home=c:/Examples/solr -jar start.jar
java -Dsolr.solr.home=c:\Examples\solr -jar start.jar
java -Dsolr.solr.home=solr/ -jar start.jar

What am I missing?

--Robert

-Original Message-
From: Johan Cwiklinski [mailto:johan.cwiklin...@ajlsm.com] 
Sent: Friday, May 21, 2010 5:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 1.4 Enterprise Search Server book examples

Hello,

Le 21/05/2010 13:29, Stefan Moises a écrit :
 Hi,
 
 everybody who owns the book can now download the source code examples 
 again, the zip file is fixed now - just got a message from Packt! :)
 https://www.packtpub.com/support?nid=4191
 
 Have fun :)
 Cheers,
 Stefan

I've received the same message today ; finally, I'll can take a look at those 
examples :)

Regards,
--
Johan Cwiklinski
AJLSM


SolrJ/EmbeddedSolrServer

2010-05-21 Thread Ken Krugler
I've got a situation where my data directory (a) needs to live  
elsewhere besides inside of Solr home, (b) moves to a different  
location when updating indexes, and (c) setting up a symlink from  
solr_home/data isn't a great option.


So what's the best approach to making this work with SolrJ? The low- 
level solution seems to be


- create my own SolrCore instance, where I specify the data directory
- use that to update the CoreContainer
- create a new EmbeddedSolrServer

But recreating the EmbeddedSolrServer with each index update feels  
wrong, and I'd like to avoid mucking around with low-level SolrCore  
instantiation.


Any other approaches?

Thanks,

-- Ken


Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






Re: DIH post import event listener for errors

2010-05-21 Thread Robert Zotter

Added a patch on the latest trunk:
http://issues.apache.org/jira/browse/SOLR-1922
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-post-import-event-listener-for-errors-tp834645p835704.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH post import event listener for errors

2010-05-21 Thread David Smiley (@MITRE.org)

I'd consider using the logging framework.  I do this with Log4j in other
apps.  Its a generic approach that works for just about any system.

~ David Smiley

-
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-post-import-event-listener-for-errors-tp834645p835904.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr 1.4 Enterprise Search Server book examples

2010-05-21 Thread David Smiley (@MITRE.org)

Hello Rob, 
Thank you for buying the book.  I'm the lead author.  There is a README.txt
file in the root of the zip which includes a rather full invocation of java
to kick off Solr that is to be used for the example data.  The options as
part of the invocation should elucidate what's going on.  The layout of
where Solr's home is in relation to where Jetty is does not coincide with a
standard Solr distribution's example directory.  In hind-site, I should
have made it the same so as not to confuse people.  Sorry.

And I have no idea why the download got corrupted on Packt's server.  I made
a smaller distribution for them (~127MB vs 300-something) and put the data
files on MusicBrainz' servers which are downloaded as part of the setup
script you should run.

~ David Smiley

-
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-1-4-Enterprise-Search-Server-book-examples-tp756119p835927.html
Sent from the Solr - User mailing list archive at Nabble.com.