Re: Guide to using SolrQuery object

2009-08-09 Thread Aleksander M. Stensby
You'll find the available parameters in various interfaces in the package  
org.apache.solr.common.params.*


For instance:
import org.apache.solr.common.params.FacetParams;
import org.apache.solr.common.params.ShardParams;
import org.apache.solr.common.params.TermVectorParams;

As a side note to what Shalin said, SolrQuery extends ModifiableSolrParams  
(just so that you are aware of that).

Hope that helps a bit.

Cheers,
 Aleks

On Tue, 14 Jul 2009 16:27:50 +0200, Reuben Firmin   
wrote:


Also, are there enums or constants around the various param names that  
can

be passed in, or do people tend to define those themselves?
Thanks!
Reuben




--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.com
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Query on date fields

2009-06-12 Thread Aleksander M. Stensby

Hello,
for this you can simply use the nifty date functions supplied by SOLR  
(given that you have indexed your fields with the solr Date field.


If I understand you correctly, you can achieve what you want with the  
following union query:


displayStartDate:[* TO NOW] AND displayEndDate:[NOW TO *]

Cheers,
 Aleksander



On Mon, 08 Jun 2009 09:17:26 +0200, prerna07   
wrote:





Hi,

I have two date attributes in my Indexes:

DisplayStartDate_dt
DisplayEndDate_dt

I need to fetch results where today's date lies between displayStartDate  
and

dislayEndDate.

However i cannot send hardcoded displayStartdate and displayEndDate date  
in

query as there are 1000 different dates in indexes

Please suggest the query.

Thanks,
Prerna








--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Configure Collection Distribution in Solr 1.3

2009-06-12 Thread Aleksander M. Stensby
As some people have mentioned here on this mailing lists, the solr 1.3  
distribution scripts (snappuller / shooter) etc do not work on windows.  
Some have indicated that it might be possible to use cygwin but I have  
doubts. So unfortunately, windows users suffers with regard to replication  
(although I would reccommend everyone to use Unix for running servers;) )


That being said, you can use Solr 1.4 (one of the nightly builds) where  
you get built-in replication that is easily configured through the solr  
server configuration, and this works on Windows aswell!


So, if you don't have any real reason to not upgrade, I suggest that you  
try out Solr 1.4 (which also gives lots of new features and major  
improvements!)


Cheers,
 Aleksander


On Tue, 09 Jun 2009 21:00:27 +0200, MaheshR   
wrote:




Hi Aleksander ,


I gone thorugh the below links and successfully configured rsync using
cygwin on windows xp. In Solr documentation they mentioned many script  
files

like rysnc-enable, snapshooter..etc. These all UNIX based  files scripts.
where do I get these script files for windows OS ?

Any help on this would be great helpful.

Thanks
MaheshR.



Aleksander M. Stensby wrote:


You'll find everything you need in the Wiki.
http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline

http://wiki.apache.org/solr/SolrCollectionDistributionScripts

If things are still uncertain I've written a guide for when we used the
solr distribution scrips on our lucene index earlier. You can read that
guide here:
http://www.integrasco.no/index.php?option=com_content&view=article&id=51:lucene-index-replication&catid=35:blog&Itemid=53

Cheers,
  Aleksander


On Mon, 08 Jun 2009 18:22:01 +0200, MaheshR 
wrote:



Hi,

we configured multi-core solr 1.3 server in Tomcat 6.0.18 servlet
container.
Its working great. Now I need to configure collection Distribution to
replicate indexing data between master and 2 slaves. Please provide me
step
by step instructions to configure collection distribution between  
master

and
slaves would be helpful.

Thanks in advance.

Thanks
Mahesh.




--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this  
e-mail









--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Search Phrase Wildcard?

2009-06-11 Thread Aleksander M. Stensby
Well yes:) Since Solr do infact support the entire lucene query parser  
syntax:)


- Aleks

On Thu, 11 Jun 2009 13:57:23 +0200, Avlesh Singh  wrote:


Infact, Lucene does not support that.

Lucene supports single and multiple character wildcard searches within

single terms (*not within phrase queries*).



Taken from
http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Wildcard%20Searches

Cheers
Avlesh

On Thu, Jun 11, 2009 at 4:32 PM, Aleksander M. Stensby <
aleksander.sten...@integrasco.no> wrote:


Solr does not support wildcards in phrase queries, yet.

Cheers,
 Aleks


On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun  


wrote:

 Hi all,

I have my document like this:



Solr web service



Is there any ways that I can search like startswith:

"So* We*" : found
"Sol*": found
"We*": not found

Cheers,
Samnang





--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this  
e-mail






--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Search Phrase Wildcard?

2009-06-11 Thread Aleksander M. Stensby

Solr does not support wildcards in phrase queries, yet.

Cheers,
 Aleks

On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun  
 wrote:



Hi all,
I have my document like this:



Solr web service



Is there any ways that I can search like startswith:

"So* We*" : found
"Sol*": found
"We*": not found

Cheers,
Samnang




--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Sharding strategy

2009-06-09 Thread Aleksander M. Stensby

Hi Otis,
thanks for your reply!
You could say I'm lucky (and I totally agree since I've made the choice of  
ordering the data that way:p).
What you describe is what I've thought about doing and I'm happy to read  
that you approve. It is always nice to know that you are not doing things  
completely off - that's what I love about this mailing list!


I've implemented a sharded "yellow pages" that builds up the shard  
parameter and it will obviously be easy to search in two shards to  
overcome the beginning of the year situation, just thought it might be a  
bit stupid to search for 1% of the data in the "latest shard" and the rest  
in shard n-1. How much of a performance decrease do you recon I will get  
from searching two shards instead of one?


Anyways, thanks for confirming things, Otis!

Cheers,
 Aleksander




On Wed, 10 Jun 2009 07:51:16 +0200, Otis Gospodnetic  
 wrote:




Aleksander,

In a sense you are lucky you have time-ordered data.  That makes it very  
easy to shard and cheaper to search - you know exactly which shards you  
need to query.  The beginning of the year situation should also be  
easy.  Do start with the latest shard for the current year, and go to  
next shard only if you have to (e.g. if you don't get enough results  
from the first shard).


 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Aleksander M. Stensby 
To: "solr-user@lucene.apache.org" 
Sent: Tuesday, June 9, 2009 7:07:47 AM
Subject: Sharding strategy

Hi all,
I'm trying to figure out how to shard our index as it is growing  
rapidly and we

want to make our solution scalable.
So, we have documents that are most commonly sorted by their date. My  
initial
thought is to shard the index by date, but I wonder if you have any  
input on

this and how to best solve this...

I know that the most frequent queries will be executed against the  
"latest"
shard, but then let's say we shard by year, how do we best solve the  
situation
that will occur in the beginning of a new year? (Some of the data will  
be in the

last shard, but most of it will be on the second last shard.)

Would it be stupid to have a "latest" shard with duplicate data (always
consisting of the last 6 months or something like that) and maintain  
that index

in addition to the regular yearly shards? Any one else facing a similar
situation with a good solution?

Any input would be greatly appreciated :)

Cheers,
Aleksander



--Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this  
e-mail







--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Sharding strategy

2009-06-09 Thread Aleksander M. Stensby

Hi all,
I'm trying to figure out how to shard our index as it is growing rapidly  
and we want to make our solution scalable.
So, we have documents that are most commonly sorted by their date. My  
initial thought is to shard the index by date, but I wonder if you have  
any input on this and how to best solve this...


I know that the most frequent queries will be executed against the  
"latest" shard, but then let's say we shard by year, how do we best solve  
the situation that will occur in the beginning of a new year? (Some of the  
data will be in the last shard, but most of it will be on the second last  
shard.)


Would it be stupid to have a "latest" shard with duplicate data (always  
consisting of the last 6 months or something like that) and maintain that  
index in addition to the regular yearly shards? Any one else facing a  
similar situation with a good solution?


Any input would be greatly appreciated :)

Cheers,
 Aleksander



--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Multiple queries in one, something similar to a SQL "union"

2009-06-09 Thread Aleksander M. Stensby
I don't know if I follow you correctly, but you are saying that you want X  
results per type?
So you do something like limit=X and query = type:Y etc. and merge the  
results?


- Aleks


On Tue, 09 Jun 2009 12:33:21 +0200, Avlesh Singh  wrote:

I have an index with two fields - name and type. I need to perform a  
search
on the name field so that *equal number of results are fetched for each  
type

*.
Currently, I am achieving this by firing multiple queries with a  
different

type and then merging the results.
In my database driven version, I used to do a "union" of multiple queries
(and not separate SQL queries) to achieve this.

Can Solr do something similar? If not, can this be a possible  
enhancement?


Cheers
Avlesh




--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Solr Multiple Queries?

2009-06-09 Thread Aleksander M. Stensby

Hi there Samnang!
Please see inline for comments:

On Tue, 09 Jun 2009 08:40:02 +0200, Samnang Chhun  
 wrote:



Hi all,
I just get started looking at using Solr as my search web service. But I
don't know does Solr have some features for multiple queries:

- Startswith
This is what we call prefix queries and wild card queries. For instance,  
you want something that starts with "man", you can search for man*



- Exact Match

Exact matching is done with apostrophes; "Solr rocks"


- Contain
Hmm, what do you mean by contain? Inside a given word? That might be a bit  
more tricky. We have an issue open at the moment for supporting leading  
wildcards, and that might allow for you to search for *cogn* and match  
recognition etc. If that was what you meant, you can look at the ongoing  
issue http://issues.apache.org/jira/browse/SOLR-218



- Doesn't Contain
NOT or - are keywords to exclude something (solr supports all the boolean  
operators that Lucene supports).



- In the range

range queries in solr are done by using brackets.
for instance
price:[500 TO 1000]
will return all results with prices ranging from 500 to 1000.

There is a lot of information on the Wiki that you should check out:
http://wiki.apache.org/solr/




Could anyone guide me how to implement those features in Solr?

Cheers,
Samnang



Cheers,
 Aleks


--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Configure Collection Distribution in Solr 1.3

2009-06-08 Thread Aleksander M. Stensby

You'll find everything you need in the Wiki.
http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline

http://wiki.apache.org/solr/SolrCollectionDistributionScripts

If things are still uncertain I've written a guide for when we used the  
solr distribution scrips on our lucene index earlier. You can read that  
guide here:

http://www.integrasco.no/index.php?option=com_content&view=article&id=51:lucene-index-replication&catid=35:blog&Itemid=53

Cheers,
 Aleksander


On Mon, 08 Jun 2009 18:22:01 +0200, MaheshR   
wrote:




Hi,

we configured multi-core solr 1.3 server in Tomcat 6.0.18 servlet  
container.

Its working great. Now I need to configure collection Distribution to
replicate indexing data between master and 2 slaves. Please provide me  
step
by step instructions to configure collection distribution between master  
and

slaves would be helpful.

Thanks in advance.

Thanks
Mahesh.




--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Terms Component

2009-06-08 Thread Aleksander M. Stensby
You can try out the nightly build of solr (which is the solr 1.4 dev  
version) containing all the new nice and shiny features of Solr 1.4:)
To use Terms Component you simply need to configure the handler as  
explained in the documentation / wiki.


Cheers,
 Aleksander


On Mon, 08 Jun 2009 14:22:15 +0200, Anshuman Manur  
 wrote:



while on the subject, can anybody tell me when Solr 1.4 might come out?

Thanks
Anshuman Manur

On Mon, Jun 8, 2009 at 5:37 PM, Anshuman Manur
wrote:


I'm using Solr 1.3 apparently.and Solr 1.4 is not out yet.
Sorry..My mistake!


On Mon, Jun 8, 2009 at 5:18 PM, Anshuman Manur <
anshuman_ma...@stragure.com> wrote:


Hello,

I want to use the terms component in Solr 1.4: But

http://localhost:8983/solr/terms?terms.fl=name


But, I get the following error with the above query:

java.lang.NullPointerException
at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:37)
	at  
org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:104)

at org.apache.solr.search.QParser.getQuery(QParser.java:88)


	at  
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:82)
	at  
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:148)
	at  
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)



at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:84)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:690)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)


	at  
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
	at  
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at  
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:295)



	at  
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at  
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at  
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)



	at  
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
	at  
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
	at  
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)



	at  
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at  
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568)
	at  
org.ofbiz.catalina.container.CrossSubdomainSessionValve.invoke(CrossSubdomainSessionValve.java:44)



	at  
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
	at  
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
	at  
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)



	at  
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)

at java.lang.Thread.run(Thread.java:619)


Any help would be great.

Thanks
Anshuman Manur








--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


StreamingUpdateSolrServer recommendations?

2009-06-08 Thread Aleksander M. Stensby

Hi all,
I guess this questions i mainly aimed to you, Ryan.
I've been trying out your StreamingUpdateSolrServer implementation for
indexin, and clearly see the improvements in indexing-times compared to
the CommonsHttpSolrServer :)
Great work!

My question is, do you have any recommendations as to what values I should
use / have you found a "sweet-spot"? What are the trade-offs? Thread count
is obvious with regard to the number of cpus available, but what about the
queue size? Any thoughts? I tried 20 / 3 as you have posted in the issue
thread, and get averages of about 80 documents / sec (and I have not
optimized the document processing etc, which takes the larger part of the
time).

Anyways, I was just curious on what others are using (and what times you
are getting at)

Keep up the good work!

   Aleks


--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Initialising of CommonsHttpSolrServer in Spring framwork

2009-05-15 Thread Aleksander M. Stensby
Out of the box, the simplest way to configure CommonsHttpSolrServer  
through a spring application context is to simply define the bean for the  
server and inject it into whatever class you have that will use it, like  
Avlesh shared below.
	class="org.apache.solr.client.solrj.impl.CommonsHttpSolrServer" >


http://localhost:8080/solr/core0



You can also set the connection parameters like Avlesh did with the  
HttpClient in the context, or directly in the init method of your  
implementation.

Inject it with a property:




A bit more tricky with the embedded solr server since you need to also  
register cores etc. We solved that by creating a core configuration loader  
class.


- Aleks


On Sat, 09 May 2009 03:08:25 +0200, Avlesh Singh  wrote:


I am giving you a detailed sample of my spring usage.

class="org.apache.commons.httpclient.HttpClient">












http://localhost/solr/core1"/>





http://localhost/solr/core2"/>





Hope this helps.

Cheers
Avlesh

On Sat, May 9, 2009 at 12:39 AM, sachin78  
wrote:




Ranjeeth,

   Did you figured aout how to do this? If yes, can you share with me  
how

you did it? Example bean definition in xml will be helpful.

--Sachin


Funtick wrote:
>
> Use constructor and pass URL parameter. Nothing SPRING related...
>
> Create a Spring bean with attributes 'MySolr', 'MySolrUrl', and 'init'
> method... 'init' will create instance of CommonsHttpSolrServer.  
Configure

> Spring...
>
>
>
>> I am using Solr 1.3 and Solrj as a Java Client. I am
>> Integarating Solrj in Spring framwork, I am facing a problem,
>> Spring framework is not inializing CommonsHttpSolrServer
>> class, how can  I define this class to get the instance of
>> SolrServer to invoke furthur method on this.
>>
>
>
>

--
View this message in context:
http://www.nabble.com/Initialising-of-CommonsHttpSolrServer-in-Spring-framwork-tp18808743p23451795.html
Sent from the Solr - User mailing list archive at Nabble.com.






--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: How do I accomplish this (semi-)complicated setup?

2009-03-26 Thread Aleksander M. Stensby
/removed, what's the best way  
to

>> >> keep that in sync?
>> >>
>> >> 2. In the event that a repository that is private, is made  
public,

how
>> >> easy would it be to run an "UPDATE" so to speak?
>> >>
>> >>
>> >> Jesper
>> >>
>> >> > On Mar 25, 2009, at 12:52 PM, Jesper Nøhr wrote:
>> >> >
>> >> >> Hi list,
>> >> >>
>> >> >> I've finally settled on Solr, seeing as it has almost  
everything I

>> >> >> could want out of the box.
>> >> >>
>> >> >> My setup is a complicated one. It will serve as the search  
backend

on
>> >> >> Bitbucket.org, a mercurial hosting site. We have literally
thousands
>> >> >> of code repositories, as well as users and other data. All  
this

needs
>> >> >> to be indexed.
>> >> >>
>> >> >> The complication comes in when we have private repositories.  
Only

>> >> >> select users have access to these, but we still need to index
them.
>> >> >>
>> >> >> How would I go about accomplishing this? I can't think of a  
clean

way
>> to
>> >> >> do it.
>> >> >>
>> >> >> Any pointers much appreciated.
>> >> >>
>> >> >>
>> >> >> Jesper
>> >> >
>> >> > -
>> >> > Eric Pugh | Principal | OpenSource Connections, LLC |  
434.466.1467

|
>> >> > http://www.opensourceconnections.com
>> >> > Free/Busy: http://tinyurl.com/eric-cal
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >>
>> >
>>
>









--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no

Please consider the environment before printing all or any of this e-mail


Re: Solrj: Getting response attributes from QueryResponse

2008-12-24 Thread Aleksander M. Stensby

Hello there Mark!
With SolrJ, you can simply do the following:
server.query(q) returns QueryResponse

the queryResponse has the method getResults() which returns  
SolrDocumentList. This is an extended list containing SolrDocuments, but  
it also exposes methods such as getNumFound(), which is exactly what you  
are looking for!


so, you could do something like this:
int hits = solrServer.query(q).getResults().getNumFound();

and you have similar methods for the other attributes, like:
results.getMaxScore();
and
results.getStart();

Hope that helps.

Cheers, and merry Christmas!
 Aleks

On Fri, 19 Dec 2008 21:22:48 +0100, Mark Ferguson  
 wrote:



Hello,

I am trying to get the numFound attribute from a returned QueryResponse
object, but for the life of me I can't find where it is stored. When I  
view
a response in XML format, it is stored as an attribute on the response  
node,

e.g.:



However, I can't find a way to retrieve these attributes (numFound, start
and maxScore). When I look at the QueryResponse itself, I can see that  
the
attributes are being stored somewhere, because the toString method  
returns

them. For example, queryResponse.toString() returns:

{responseHeader={status=0,QTime=139,params={wt=javabin,hl=true,rows=15,version=2.2,fl=urlmd5,start=0,q=java}},response={
*numFound=1228*,start=03.633028,docs=[SolrDocument[{urlmd5=...

The problem is that when I call queryResponse.get('response'), all I get  
is
the list of SolrDocuments, I don't have any other attributes. Am I  
missing
something or are these attributes just not publically available? If  
they're

not, shouldn't they be? Thanks a lot,

Mark Ferguson




--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no

Please consider the environment before printing all or any of this e-mail


TermVectorComponent and SolrJ

2008-12-18 Thread Aleksander M. Stensby
Hello everyone, I've started to look at TermVectorComponent and I'm  
experimenting with the use of the component in a sort of "top terms"  
setting for a given query...
Was also looking at mlt and the interestingTerms, but I would like to do a  
query, get say 10k results, and from those results return a list of "top  
10 terms" or something similar...


Haven't really thought too much about it yet, but I was wondering if  
anyone have done any work on making the term vector response available in  
a simple manner with solrj yet? Or if this is planned? (In the same sense  
as it is today with facets (response.getFacetFields() etc..). Not that I  
cant manage to write it myself, but I would recon that more people than me  
would be interessted in this. I'd be more than happy to contribute if it  
is wanted, just wanted to check if anyone have started on this already or  
not.


Cheers,
 Aleks

--
Aleksander M. Stensby
Senior software developer
Integrasco A/S

Please consider the environment before printing all or any of this e-mail


Re: What are the scenarios when a new Searcher is created ?

2008-11-30 Thread Aleksander M. Stensby
When adding documents to solr, the searcher will not be replaced, but once  
you do a commit, (dependening on settings) a new searcher will be opened  
and warmed up while the old searcher will still be open and used when  
searching. Once the new searcher has finished its warmup procedure, the  
old searcher will be replaced with the new warmed searcher, which will now  
allow you to search the newest documents added to the index.


- Aleks

On Mon, 01 Dec 2008 01:32:05 +0100, souravm <[EMAIL PROTECTED]> wrote:


Hi All,

Say I have started a new Solr server instance using the start.jar in  
java command. Now for this Solr server instance when all a new Searcher  
would be created ?


I am aware of following scenarios -

1. When the instance is started for autowarming a new Searcher is  
created. But not sure whether this searcher will continue to be alive or  
will die after the autowarming is over.
2. When I do the first search in this server instance through select, a  
new searcher would be created and then onwards the same searcher would  
be used for all select to this instance. Even if I run multiple search  
request concurrently I see that the same Searcher is used to service   
those requests.
3. When I try to add an index to this instance through update statement  
a new searcher is created.


Please let me know if there are any other situation when a new Searcher  
is created.


Regards,
Sourav



 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended  
solely
for the use of the addressee(s). If you are not the intended recipient,  
please
notify the sender by e-mail and delete the original message. Further,  
you are not
to copy, disclose, or distribute this e-mail or its contents to any  
other person and
any such actions are unlawful. This e-mail may contain viruses. Infosys  
has taken
every reasonable precaution to minimize this risk, but is not liable for  
any damage
you may sustain as a result of any virus in this e-mail. You should  
carry out your
own virus checks before opening the e-mail or attachment. Infosys  
reserves the
right to monitor and review the content of all messages sent to or from  
this e-mail
address. Messages sent to or from this e-mail address may be stored on  
the

Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***





--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Keyword extraction

2008-11-27 Thread Aleksander M. Stensby

Hi again Patrick.
Glad to hear that we can contribute to help you guys. Thats what this  
mailing list is for:)


First of all, I think you use the wrong parameter to get your terms.
Take a look at  
http://lucene.apache.org/solr/api/org/apache/solr/common/params/MoreLikeThisParams.html  
to see the supported params.
In your string you use mlt.displayTerms=list, which i believe should be  
mlt.interestingTerms=list.


If that doesn't work:
One thing you should know is that from what i can tell, you are using the  
StandardRequestHandler in your querying. The StandardRequestHandler  
supports a simplified handling of more like these queries, namely; "This  
method returns similar documents for each document in the response set."
it supports the common mlt parameters, needs mlt=true (as you have done)  
and supports a mlt.count parameter to specify the number of similar  
documents returned for each matching doc from your query.


If you want to get the "top keywords" etc, (and in essence your  
mlt.interestingTerms=list parameter to have any effect at all, if I'm not  
completely wrong), you will need to configure up a MoreLikeThisHandler in  
your solrconfig.xml and then map that to your query.


From the sample configuration file:
	incoming queries will be dispatched to the correct handler based on the  
path or the qt (query type) param. Names starting with a '/' are accessed  
with the a path equal to the registered name.  Names without a leading '/'  
are accessed with: http://host/app/select?qt=name If no qt is defined, the  
requestHandler that declares default="true" will be used.


You can read about the MoreLikeThisHandler here:  
http://wiki.apache.org/solr/MoreLikeThisHandler


Once you have it configured properly your query would be something like:
http://localhost:8983/solr/mlt?q=amsterdam&mlt.fl=text&mlt.interestingTerms=list&mlt=true  
(don't think you need the mlt=true here tho...)

or
http://localhost:8983/solr/select?qt=mlt&q=amsterdam&mlt.fl=text&mlt.interestingTerms=list&mlt=true
(in the last example I use qt=mlt)

Hope this helps.
Regards,
 Aleksander


On Thu, 27 Nov 2008 11:49:30 +0100, Plaatje, Patrick  
<[EMAIL PROTECTED]> wrote:



Hi Aleksander,

With all the help of you and the other comments, we're now at a point  
where a MoreLikeThis list is returned, and shows 10 related records.  
However on the query executed there are no keywords whatsoever being  
returned. Is the querystring still wrong or is something else required?


The querystring we're currently executing is:

http://suempnr3:8080/solr/select/?q=amsterdam&mlt.fl=text&mlt.displayTerms=list&mlt=true


Best,

Patrick

-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: woensdag 26 november 2008 15:07
To: solr-user@lucene.apache.org
Subject: Re: Keyword extraction

Ah, yes, That is important. In lucene, the MLT will see if the term  
vector is stored, and if it is not it will still be able to perform the  
querying, but in a much much much less efficient way.. Lucene will  
analyze the document (and the variable DEFAULT_MAX_NUM_TOKENS_PARSED  
will be used to limit the number of tokens that will be parsed). (don't  
want to go into details on this since I haven't really dug through the  
code:p) But when the field isn't stored either, it is rather difficult  
to re-analyze the

document;)

On a general note, if you want to "really" understand how the MLT works,  
take a look at the wiki or read this thorough blog post:

http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/

Regards,
  Aleksander

On Wed, 26 Nov 2008 14:41:52 +0100, Plaatje, Patrick  
<[EMAIL PROTECTED]> wrote:



Hi Aleksander,

This was a typo on my end, the original query included a semicolon
instead of an equal sign. But I think it has to do with my field not
being stored and not being identified as termVectors="true". I'm
recreating the index now, and see if this fixes the problem.

Best,

patrick

-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: woensdag 26 november 2008 14:37
To: solr-user@lucene.apache.org
Subject: Re: Keyword extraction

Hi there!
Well, first of all i think you have an error in your query, if I'm not
mistaken.
You say http://localhost:8080/solr/select/?q=id=18477975...
but since you are referring to the field called "id", you must say:
http://localhost:8080/solr/select/?q=id:18477975...
(use colon instead of the equals sign).
I think that will do the trick.
If not, try adding the &debugQuery=on at the end of your request url,
to see debug output on how the query is parsed and if/how any
documents are matched against your query.
Hope this helps.

Cheers,
  Aleksander



On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick
<[EMAIL PROTECTED]> wrote:

Re: facet.sort and distributed search

2008-11-26 Thread Aleksander M. Stensby
This is a known issue but take a look at the following jira issue and the  
patch supplied there:

https://issues.apache.org/jira/browse/SOLR-764

Haven't tried it myself, but i believe it should do the trick for you.
Hope that helps.

Cheers,
 Aleksander

On Wed, 26 Nov 2008 22:53:21 +0100, Grégoire Neuville  
<[EMAIL PROTECTED]> wrote:



Hi,

I'm working on an web application one functionality of which consists
in presenting to the user a list of terms to seize in a form field,
sorted alphabetically. As long as one single index was concerned, I
used solr facets to produce the list and it worked fine. But I must
now deal with several indices,  and thus use the distributed search
capability of solr, which forbid the use of "facet.sort=false".

I would like to know if someone plans to, or is even working on, the
implementation of the natural facet sorting in case of a distributed
search.

Thanks a lot,




--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Can a lucene document be used in solr?

2008-11-26 Thread Aleksander M. Stensby

Hello there,
 do you mean a lucene Document or do you mean if it is possible to use an  
existing lucene index with solr?
In the latter case, the answer is yes, since solr is built on top of  
lucene. But it requires you to configure your schema.xml to correlate to  
the index-structure of your existing lucene index. On the question of  
document, Solr will take what is called a SolrInputDocument as input if  
you are using solrj, or xml if you are using http. Don't know if that  
answered your question or not..

 Regards,
 Aleksander


On Thu, 27 Nov 2008 05:55:06 +0100, Sajith Vimukthi <[EMAIL PROTECTED]>  
wrote:



Hi all,

Can someone of you all tell me whether  I can use a lucene document in  
solr?



Regards,


Sajith Vimukthi Weerakoon

Associate Software Engineer | ZONE24X7

| Tel: +94 11 2882390 ext 101 | Fax: +94 11 2878261 |

http://www.zone24x7.com






--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Keyword extraction

2008-11-26 Thread Aleksander M. Stensby
Ah, yes, That is important. In lucene, the MLT will see if the term vector  
is stored, and if it is not it will still be able to perform the querying,  
but in a much much much less efficient way.. Lucene will analyze the  
document (and the variable DEFAULT_MAX_NUM_TOKENS_PARSED will be used to  
limit the number of tokens that will be parsed). (don't want to go into  
details on this since I haven't really dug through the code:p) But when  
the field isn't stored either, it is rather difficult to re-analyze the  
document;)


On a general note, if you want to "really" understand how the MLT works,  
take a look at the wiki or read this thorough blog post:  
http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/


Regards,
 Aleksander

On Wed, 26 Nov 2008 14:41:52 +0100, Plaatje, Patrick  
<[EMAIL PROTECTED]> wrote:



Hi Aleksander,

This was a typo on my end, the original query included a semicolon  
instead of an equal sign. But I think it has to do with my field not  
being stored and not being identified as termVectors="true". I'm  
recreating the index now, and see if this fixes the problem.


Best,

patrick

-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: woensdag 26 november 2008 14:37
To: solr-user@lucene.apache.org
Subject: Re: Keyword extraction

Hi there!
Well, first of all i think you have an error in your query, if I'm not  
mistaken.

You say http://localhost:8080/solr/select/?q=id=18477975...
but since you are referring to the field called "id", you must say:
http://localhost:8080/solr/select/?q=id:18477975...
(use colon instead of the equals sign).
I think that will do the trick.
If not, try adding the &debugQuery=on at the end of your request url, to  
see debug output on how the query is parsed and if/how any documents are  
matched against your query.

Hope this helps.

Cheers,
  Aleksander



On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick  
<[EMAIL PROTECTED]> wrote:



Hi Aleksander,

Thanx for clearing this up. I am confident that this is a way to
explore for me as I'm just starting to grasp the matter. Do you know
why I'm not getting any results with the query posted earlier then? It
gives me the folowing only:


 

Instead of delivering details of the interestingTerms.

Thanks in advance

Patrick


-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: woensdag 26 november 2008 13:03
To: solr-user@lucene.apache.org
Subject: Re: Keyword extraction

I do not agree with you at all. The concept of MoreLikeThis is based
on the fundamental idea of TF-IDF weighting, and not term frequency  
alone.

Please take a look at:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/simil
ar/MoreLikeThis.html As you can see, it is possible to use cut-off
thresholds to significantly reduce the number of unimportant terms,
and generate highly suitable queries based on the tf-idf frequency of
the term, since as you point out, high frequency terms alone tends to
be useless for querying, but taking the document frequency into
account drastically increases the importance of the term!

In solr, use parameters to manipulate your desired results:
http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e2
2ec5d1519c456b2c
For instance:
mlt.mintf - Minimum Term Frequency - the frequency below which terms
will be ignored in the source doc.
mlt.mindf - Minimum Document Frequency - the frequency at which words
will be ignored which do not occur in at least this many docs.
You can also set thresholds for term length etc.

Hope this gives you a better idea of things.
- Aleks

On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie <[EMAIL PROTECTED]>
wrote:


Dear Partick, I had the same problem with MoreLikeThis function.

After  briefly reading and analyzing the source code of moreLikeThis
function in solr, I conducted:

MoreLikeThis uses term vectors to ranks all the terms from a document
by its frequency. According to its ranking, it will start to generate
queries, artificially, and search for documents.

So, moreLikeThis will retrieve related documents by artificially
generating queries based on most frequent terms.

There's a big problem with "most frequent terms"  from documents.
Most frequent words are usually meaningless, or so called function
words, or, people from Information Retrieval like to call them  
stopwords.

However, ignoring  technical problems of implementation of
moreLikeThis function, this approach is very dangerous, since queries
are generated artificially based on a given document.
Writting queries for retrieving a document is a human task, and it
assumes some knowledge (user knows what document he wants).

I advice to use others approaches, depending on your expectation. For
example, you can extract similar documents just by searching for
documents with similar title (m

Re: Keyword extraction

2008-11-26 Thread Aleksander M. Stensby
I'm sure that for certain problems and cases you will need to do quite a  
bit tweaking to make it work (to suite your needs), but i responded to  
your statement because you made it sound like the MoreLikeThis component  
does not work at all for its purpuse, while it actually do work as  
intended and can be of great aid in constructing queries to retrieve  
same-topic-documents etc.


- Aleksander

On Wed, 26 Nov 2008 14:10:57 +0100, Scurtu Vitalie <[EMAIL PROTECTED]>  
wrote:



Yes, I totally understand, and agree. 

MoreLikeThis uses TF-IDF to rank terms, then it generates queries based  
on top ranked terms.  In any case, I wasn't able to make it work after  
many attempts.


Finally, I've used a different method for queries generation, and it  
works better, or at least gives some results, while with moreLikeThis  
results were poor or no result at all.


To mention that my index was composed by short length documents,  
therefore the intersection between top ranked terms by TF-IDF was empty  
set.  MoreLikeThis works better when you have long documents.


Yes, I've changed the thresholds for min TFIDF and max TFIDF, and others  
parameters.


I've also used "mlt.maxqt" parameter  to increase the number of terms  
used in queries generation, but still didn't work well, since the method  
of queries generation based on terms with the highest TF-IDF score  
doesn't generate representative query for document.  I wasn't able to  
tune it. For a low value such as mlt.maxqt=3,4, results were poor, while  
for mlt.maxqt=5,6>>> it gave too many and irrelevant results.




Thank you,
Best Wishes,
Vitalie Scurtu



--- On Wed, 11/26/08, Aleksander M. Stensby  
<[EMAIL PROTECTED]> wrote:

From: Aleksander M. Stensby 
Subject: Re:  Keyword extraction
To: solr-user@lucene.apache.org
Date: Wednesday, November 26, 2008, 1:03 PM

I do not agree with you at all. The concept of MoreLikeThis is based on  
the

fundamental idea of TF-IDF weighting, and not term frequency alone.
Please take a look at:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/similar/MoreLikeThis.html
As you can see, it is possible to use cut-off thresholds to significantly
reduce the number of unimportant terms, and generate highly suitable  
queries

based on the tf-idf frequency of the term, since as you point out, high
frequency terms alone tends to be useless for querying, but taking the  
document

frequency into account drastically increases the importance of the term!

In solr, use parameters to manipulate your desired results:
http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e22ec5d1519c456b2c
For instance:
mlt.mintf - Minimum Term Frequency - the frequency below which terms  
will be

ignored in the source doc.
mlt.mindf - Minimum Document Frequency - the frequency at which words  
will be

ignored which do not occur in at least this many docs.
You can also set thresholds for term length etc.

Hope this gives you a better idea of things.
- Aleks

On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie <[EMAIL PROTECTED]>
wrote:


Dear Partick, I had the same problem with MoreLikeThis function.

After  briefly reading and analyzing the source code of moreLikeThis

function in solr, I conducted:


MoreLikeThis uses term vectors to ranks all the terms from a document
by its frequency. According to its ranking, it will start to generate
queries, artificially, and search for documents.

So, moreLikeThis will retrieve related documents by artificially

generating queries based on most frequent terms.


There's a big problem with "most frequent terms"  from
documents. Most frequent words are usually meaningless, or so called  
function

words, or, people from Information Retrieval like to call them stopwords.
However, ignoring  technical problems of implementation of moreLikeThis
function, this approach is very dangerous, since queries are generated
artificially based on a given document.
Writting queries for retrieving a document is a human task, and it  
assumes

some knowledge (user knows what document he wants).


I advice to use others approaches, depending on your expectation. For
example, you can extract similar documents just by searching for  
documents with

similar title (more like this doesn't work in this case).


I hope it helps,
Best Regards,
Vitalie Scurtu
--- On Wed, 11/26/08, Plaatje, Patrick

<[EMAIL PROTECTED]> wrote:

From: Plaatje, Patrick <[EMAIL PROTECTED]>
Subject: RE:  Keyword extraction
To: solr-user@lucene.apache.org
Date: Wednesday, November 26, 2008, 10:52 AM

Hi All,
as an addition to my previous post, no interestingTerms are returned
when i execute the folowing url:


http://localhost:8080/solr/select/?q=id=18477975&mlt.fl=text&mlt.interes

tingTerms=list&mlt=true&mlt.match.include=true
I get a moreLikeThis list though, any thoughts?
Best,
Patrick








--Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no



Re: Keyword extraction

2008-11-26 Thread Aleksander M. Stensby

Hi there!
Well, first of all i think you have an error in your query, if I'm not  
mistaken.

You say http://localhost:8080/solr/select/?q=id=18477975...
but since you are referring to the field called "id", you must say:
http://localhost:8080/solr/select/?q=id:18477975...
(use colon instead of the equals sign).
I think that will do the trick.
If not, try adding the &debugQuery=on at the end of your request url, to  
see debug output on how the query is parsed and if/how any documents are  
matched against your query.

Hope this helps.

Cheers,
 Aleksander



On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick  
<[EMAIL PROTECTED]> wrote:



Hi Aleksander,

Thanx for clearing this up. I am confident that this is a way to explore  
for me as I'm just starting to grasp the matter. Do you know why I'm not  
getting any results with the query posted earlier then? It gives me the  
folowing only:






Instead of delivering details of the interestingTerms.

Thanks in advance

Patrick


-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: woensdag 26 november 2008 13:03
To: solr-user@lucene.apache.org
Subject: Re: Keyword extraction

I do not agree with you at all. The concept of MoreLikeThis is based on  
the fundamental idea of TF-IDF weighting, and not term frequency alone.

Please take a look at:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/similar/MoreLikeThis.html
As you can see, it is possible to use cut-off thresholds to  
significantly reduce the number of unimportant terms, and generate  
highly suitable queries based on the tf-idf frequency of the term, since  
as you point out, high frequency terms alone tends to be useless for  
querying, but taking the document frequency into account drastically  
increases the importance of the term!


In solr, use parameters to manipulate your desired results:
http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e22ec5d1519c456b2c
For instance:
mlt.mintf - Minimum Term Frequency - the frequency below which terms  
will be ignored in the source doc.
mlt.mindf - Minimum Document Frequency - the frequency at which words  
will be ignored which do not occur in at least this many docs.

You can also set thresholds for term length etc.

Hope this gives you a better idea of things.
- Aleks

On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie <[EMAIL PROTECTED]>
wrote:


Dear Partick, I had the same problem with MoreLikeThis function.

After  briefly reading and analyzing the source code of moreLikeThis
function in solr, I conducted:

MoreLikeThis uses term vectors to ranks all the terms from a document
by its frequency. According to its ranking, it will start to generate
queries, artificially, and search for documents.

So, moreLikeThis will retrieve related documents by artificially
generating queries based on most frequent terms.

There's a big problem with "most frequent terms"  from documents. Most
frequent words are usually meaningless, or so called function words,
or, people from Information Retrieval like to call them stopwords.
However, ignoring  technical problems of implementation of
moreLikeThis function, this approach is very dangerous, since queries
are generated artificially based on a given document.
Writting queries for retrieving a document is a human task, and it
assumes some knowledge (user knows what document he wants).

I advice to use others approaches, depending on your expectation. For
example, you can extract similar documents just by searching for
documents with similar title (more like this doesn't work in this case).

I hope it helps,
Best Regards,
Vitalie Scurtu
--- On Wed, 11/26/08, Plaatje, Patrick <[EMAIL PROTECTED]>
wrote:
From: Plaatje, Patrick <[EMAIL PROTECTED]>
Subject: RE:  Keyword extraction
To: solr-user@lucene.apache.org
Date: Wednesday, November 26, 2008, 10:52 AM

Hi All,
as an addition to my previous post, no interestingTerms are returned
when i execute the folowing url:
http://localhost:8080/solr/select/?q=id=18477975&mlt.fl=text&mlt.inter
es tingTerms=list&mlt=true&mlt.match.include=true
I get a moreLikeThis list though, any thoughts?
Best,
Patrick








--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no





--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Keyword extraction

2008-11-26 Thread Aleksander M. Stensby
I do not agree with you at all. The concept of MoreLikeThis is based on  
the fundamental idea of TF-IDF weighting, and not term frequency alone.
Please take a look at:  
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/similar/MoreLikeThis.html
As you can see, it is possible to use cut-off thresholds to significantly  
reduce the number of unimportant terms, and generate highly suitable  
queries based on the tf-idf frequency of the term, since as you point out,  
high frequency terms alone tends to be useless for querying, but taking  
the document frequency into account drastically increases the importance  
of the term!


In solr, use parameters to manipulate your desired results:  
http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e22ec5d1519c456b2c

For instance:
mlt.mintf - Minimum Term Frequency - the frequency below which terms will  
be ignored in the source doc.
mlt.mindf - Minimum Document Frequency - the frequency at which words will  
be ignored which do not occur in at least this many docs.

You can also set thresholds for term length etc.

Hope this gives you a better idea of things.
- Aleks

On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie <[EMAIL PROTECTED]>  
wrote:



Dear Partick, I had the same problem with MoreLikeThis function.

After  briefly reading and analyzing the source code of moreLikeThis  
function in solr, I conducted:


MoreLikeThis uses term vectors to ranks all the terms from a document
by its frequency. According to its ranking, it will start to generate
queries, artificially, and search for documents.

So, moreLikeThis will retrieve related documents by artificially  
generating queries based on most frequent terms.


There's a big problem with "most frequent terms"  from documents. Most  
frequent words are usually meaningless, or so called function words, or,  
people from Information Retrieval like to call them stopwords. However,  
ignoring  technical problems of implementation of moreLikeThis function,  
this approach is very dangerous, since queries are generated  
artificially based on a given document.
Writting queries for retrieving a document is a human task, and it  
assumes some knowledge (user knows what document he wants).


I advice to use others approaches, depending on your expectation. For  
example, you can extract similar documents just by searching for  
documents with similar title (more like this doesn't work in this case).


I hope it helps,
Best Regards,
Vitalie Scurtu
--- On Wed, 11/26/08, Plaatje, Patrick <[EMAIL PROTECTED]>  
wrote:

From: Plaatje, Patrick <[EMAIL PROTECTED]>
Subject: RE:  Keyword extraction
To: solr-user@lucene.apache.org
Date: Wednesday, November 26, 2008, 10:52 AM

Hi All,
as an addition to my previous post, no interestingTerms are returned
when i execute the folowing url:
http://localhost:8080/solr/select/?q=id=18477975&mlt.fl=text&mlt.interes
tingTerms=list&mlt=true&mlt.match.include=true
I get a moreLikeThis list though, any thoughts?
Best,
Patrick








--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Query for Distributed search -

2008-11-24 Thread Aleksander M. Stensby
If you for instance use SolrJ and the HttpSolrServer, you could for  
instance add logic to your querying making your searches more efficient!  
That is partially the idea of sharding, right? :) So if the user wants to  
search for a log file in June, your application knows that June logs are  
stored on the second box, and hence will redirect the search to that box.  
Alternatively if he wants to search for logs spanning two boxes, you  
merely add the shards parameter to your query and just include the path to  
those to shards in question. I'm not really sure about how solr handles  
the merging of results etc and wether or not the requests are done in  
paralell or sequentially, but I do know that you could easily manage this  
on your own through java if you want to. (Simply setting up one  
HttpSolrServer in your code for each shard, and searching them in  
parallell in separate threads. => then reducing the results afterwards).


Have a look at http://wiki.apache.org/solr/DistributedSearch for more info.
You could also take a look at Hadoop. (http://hadoop.apache.org/)

regards,
 Aleks

On Mon, 24 Nov 2008 06:24:51 +0100, souravm <[EMAIL PROTECTED]> wrote:


Hi,

Looking for some insight on distributed search.

Say I have an index distributed in 3 boxes and the index contains time  
and text data (typical log file). Each box has index for different  
timeline - say Box 1 for all Jan to April, Box 2 for May to August and  
Box 3 for Sep to Dec.


Now if I try to search for a text string, will the search would happen  
in parallel in all 3 boxes or sequentially?


Regards,
Sourav

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended  
solely
for the use of the addressee(s). If you are not the intended recipient,  
please
notify the sender by e-mail and delete the original message. Further,  
you are not
to copy, disclose, or distribute this e-mail or its contents to any  
other person and
any such actions are unlawful. This e-mail may contain viruses. Infosys  
has taken
every reasonable precaution to minimize this risk, but is not liable for  
any damage
you may sustain as a result of any virus in this e-mail. You should  
carry out your
own virus checks before opening the e-mail or attachment. Infosys  
reserves the
right to monitor and review the content of all messages sent to or from  
this e-mail
address. Messages sent to or from this e-mail address may be stored on  
the

Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***





--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Unique id

2008-11-21 Thread Aleksander M. Stensby
I still don't understand why you want two different indexes if you want to  
return the linked information each time anyways...
I would say the easiest way is just to index all data (all columns from  
your views) into the index like this:


taskid - taskname - start - end - personid - deptid - ismanager

then you can just search like I already explained earlier. This way, you  
have already joined by queue-id when you insert it into the index and thus  
you get both results from one single search. (if you also want to have the  
ability to search on the queueID, just add a column for that.


In general, your questions doesn't really have anything to do with solr,  
but architecture, db-design and what you want to search on.


 - A.



1.
Task(id* (int), name (string), start (timestamp), end (timestamp))

2.
Team(person_id (int), deptId (int), isManager (int))

* is primary key

In schema.xml I have








On Fri, 21 Nov 2008 11:59:56 +0100, Raghunandan Rao  
<[EMAIL PROTECTED]> wrote:



Can you also let me know how I join two search indices in one query?

That means, in this case I have two diff search indices and I need to
join by queueId and get all the tasks in one SolrQuery. I am creating
queries in Solrj.


-Original Message-
From: Raghunandan Rao [mailto:[EMAIL PROTECTED]
Sent: Friday, November 21, 2008 3:45 PM
To: solr-user@lucene.apache.org
Subject: RE: Unique id

Ok. I got your point. So I need not require ID field in the second view.
I will hence remove required="true" in schema.xml. What I thought was
unique ID makes indexing easier or used to maintain doc.

Thanks a lot.

-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: Friday, November 21, 2008 3:36 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Well, In that case, what do you want to search for? If I were you, I
would
make my index consist of tasks (and I assume that is what you are trying

to do).

So why don't you just use your schema.xml as you have right now, and do

the following:

Pick a person (let's say he has person_id=42 and deptId=3), get his
queue
of tasks, then for each task in queue do:
insert into index:
(id from the task), (name of the task), (id of the person), (id of the
departement)
an example:
3, "this is a very important task", 42, 3
4, "this one is also important", 42, 3
5, "this one is low priority", 42, 3

And then for the next person you do the same, (person_id=58 and
deptId=5)
insert:
6, "this is about solr", 58, 5
7, "this is about lucene", 58, 5

etc.

Now you can search for all tasks in departement 5 by doing "deptId:5".
If you want to search for all the tasks assigned to a specific person
you
just enter the query "personId:42".
And you could also search for all tasks containing certain keywords by
doing the query "name:solr" OR "name:lucene".

Do you understand now, or is it still unclear?

- Aleks



On Fri, 21 Nov 2008 10:56:38 +0100, Raghunandan Rao
<[EMAIL PROTECTED]> wrote:


Ok. There is common column in two views called queueId. I query second
view first and get all the queueids for a person. And having queueIds

I

get all the ids from first view.

Sorry for missing that column earlier. I think it should make sense

now.



-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: Friday, November 21, 2008 3:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

And in case that wasn't clear, the reason for it failing then would
obviously be because you define the id field with required="true", and
you
try inserting a document where this field is missing...

- Aleks

On Fri, 21 Nov 2008 10:46:10 +0100, Aleksander M. Stensby
<[EMAIL PROTECTED]> wrote:


Ok, this brings me to the question; how are the two view's connected

to

each other (since you are indexing partly view 1 and partly view 2

into

a single index structure?

If they are not at all connected I believe you have made a

fundamental



mistake / misunderstand the use of your index...
I assume that a Task can be assigned to a person, and your Team view
displays that person, right?

Maybe you are doing something like this:
View 1
1, somename, sometimestamp, someothertimestamp
2, someothername, somethirdtimestamp, timetamp4
...

View 2
1, 58, 0
2, 58, 1
3, 52, 0
...

I'm really confused about your database structure...
To me, It would be logical to add a team_id field to the team table,

and

add a third table to link tasks to a team (or to individual persons).
Once you have that information (because I do assume there MUST be

some



link there) you would do:
insert into your index:
  (id from the task), (name of the task), (id of the person assigned

to

this task), (id of the departement that this person works in).

I guess that you _might_ be thinking a bit wrong and trying to do
s

Re: Unique id

2008-11-21 Thread Aleksander M. Stensby
Well, In that case, what do you want to search for? If I were you, I would  
make my index consist of tasks (and I assume that is what you are trying  
to do).


So why don't you just use your schema.xml as you have right now, and do  
the following:


Pick a person (let's say he has person_id=42 and deptId=3), get his queue  
of tasks, then for each task in queue do:

insert into index:
(id from the task), (name of the task), (id of the person), (id of the  
departement)

an example:
3, "this is a very important task", 42, 3
4, "this one is also important", 42, 3
5, "this one is low priority", 42, 3

And then for the next person you do the same, (person_id=58 and deptId=5)
insert:
6, "this is about solr", 58, 5
7, "this is about lucene", 58, 5

etc.

Now you can search for all tasks in departement 5 by doing "deptId:5".
If you want to search for all the tasks assigned to a specific person you  
just enter the query "personId:42".
And you could also search for all tasks containing certain keywords by  
doing the query "name:solr" OR "name:lucene".


Do you understand now, or is it still unclear?

- Aleks



On Fri, 21 Nov 2008 10:56:38 +0100, Raghunandan Rao  
<[EMAIL PROTECTED]> wrote:



Ok. There is common column in two views called queueId. I query second
view first and get all the queueids for a person. And having queueIds I
get all the ids from first view.

Sorry for missing that column earlier. I think it should make sense now.


-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: Friday, November 21, 2008 3:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

And in case that wasn't clear, the reason for it failing then would
obviously be because you define the id field with required="true", and
you
try inserting a document where this field is missing...

- Aleks

On Fri, 21 Nov 2008 10:46:10 +0100, Aleksander M. Stensby
<[EMAIL PROTECTED]> wrote:


Ok, this brings me to the question; how are the two view's connected

to

each other (since you are indexing partly view 1 and partly view 2

into

a single index structure?

If they are not at all connected I believe you have made a fundamental



mistake / misunderstand the use of your index...
I assume that a Task can be assigned to a person, and your Team view
displays that person, right?

Maybe you are doing something like this:
View 1
1, somename, sometimestamp, someothertimestamp
2, someothername, somethirdtimestamp, timetamp4
...

View 2
1, 58, 0
2, 58, 1
3, 52, 0
...

I'm really confused about your database structure...
To me, It would be logical to add a team_id field to the team table,

and

add a third table to link tasks to a team (or to individual persons).
Once you have that information (because I do assume there MUST be some



link there) you would do:
insert into your index:
  (id from the task), (name of the task), (id of the person assigned

to

this task), (id of the departement that this person works in).

I guess that you _might_ be thinking a bit wrong and trying to do
something like this:
Treat each view as independent views, and inserting values from each
table as separate documents in the index
so you would do:
insert into your index:
  (id from the task), (name of the task), (no value), (no value) 



which will be ok to do
  (no value), (no value), (id of the person), (id of the departement)



--- which makes no sense to me...

So, can you clearify the relationship between the two views, and how

you

are thinking of inserting entries into your index?

- Aleks



On Fri, 21 Nov 2008 10:33:28 +0100, Raghunandan Rao
<[EMAIL PROTECTED]> wrote:


View structure is:

1.
Task(id* (int), name (string), start (timestamp), end (timestamp))

2.
Team(person_id (int), deptId (int), isManager (int))

* is primary key

In schema.xml I have






id


-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: Friday, November 21, 2008 2:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Hello again. I'm getting a bit confused by your questions, and I

believe


it would be easier for us to help you if you could post the field
definitions from your schema.xml and the structure of your two

database


views.
ie.
table 1: (id (int), subject (string) -.--)
table 2: (category (string), other fields ..)


So please post this and we can try to help you.

- Aleks


On Fri, 21 Nov 2008 07:49:31 +0100, Raghunandan Rao
<[EMAIL PROTECTED]> wrote:


Thanks Erik.
If I convert that to a string then id field defined in schema.xml

would

fail as I have that as integer. If I change that to string then

first

view would fail as it is Integer there. What to do in such

scenarios?

Do

I need to define multiple schema.xml or multiple unique key

definitions

in same schema. How does this work? Pls explain.

-Origin

Re: Unique id

2008-11-21 Thread Aleksander M. Stensby
And in case that wasn't clear, the reason for it failing then would  
obviously be because you define the id field with required="true", and you  
try inserting a document where this field is missing...


- Aleks

On Fri, 21 Nov 2008 10:46:10 +0100, Aleksander M. Stensby  
<[EMAIL PROTECTED]> wrote:


Ok, this brings me to the question; how are the two view's connected to  
each other (since you are indexing partly view 1 and partly view 2 into  
a single index structure?


If they are not at all connected I believe you have made a fundamental  
mistake / misunderstand the use of your index...
I assume that a Task can be assigned to a person, and your Team view  
displays that person, right?


Maybe you are doing something like this:
View 1
1, somename, sometimestamp, someothertimestamp
2, someothername, somethirdtimestamp, timetamp4
...

View 2
1, 58, 0
2, 58, 1
3, 52, 0
...

I'm really confused about your database structure...
To me, It would be logical to add a team_id field to the team table, and  
add a third table to link tasks to a team (or to individual persons).
Once you have that information (because I do assume there MUST be some  
link there) you would do:

insert into your index:
  (id from the task), (name of the task), (id of the person assigned to  
this task), (id of the departement that this person works in).


I guess that you _might_ be thinking a bit wrong and trying to do  
something like this:
Treat each view as independent views, and inserting values from each  
table as separate documents in the index

so you would do:
insert into your index:
  (id from the task), (name of the task), (no value), (no value)   
which will be ok to do
  (no value), (no value), (id of the person), (id of the departement)  
--- which makes no sense to me...


So, can you clearify the relationship between the two views, and how you  
are thinking of inserting entries into your index?


- Aleks



On Fri, 21 Nov 2008 10:33:28 +0100, Raghunandan Rao  
<[EMAIL PROTECTED]> wrote:



View structure is:

1.
Task(id* (int), name (string), start (timestamp), end (timestamp))

2.
Team(person_id (int), deptId (int), isManager (int))

* is primary key

In schema.xml I have






id


-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: Friday, November 21, 2008 2:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Hello again. I'm getting a bit confused by your questions, and I believe

it would be easier for us to help you if you could post the field
definitions from your schema.xml and the structure of your two database

views.
ie.
table 1: (id (int), subject (string) -.--)
table 2: (category (string), other fields ..)


So please post this and we can try to help you.

- Aleks


On Fri, 21 Nov 2008 07:49:31 +0100, Raghunandan Rao
<[EMAIL PROTECTED]> wrote:


Thanks Erik.
If I convert that to a string then id field defined in schema.xml

would

fail as I have that as integer. If I change that to string then first
view would fail as it is Integer there. What to do in such scenarios?

Do

I need to define multiple schema.xml or multiple unique key

definitions

in same schema. How does this work? Pls explain.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 20, 2008 6:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

I'd suggest aggregating those three columns into a string that can
serve as the Solr uniqueKey field value.

Erik


On Nov 20, 2008, at 1:10 AM, Raghunandan Rao wrote:


Basically, I am working on two views. First one has an ID column. The
second view has no unique ID column. What to do in such situations?
There are 3 other columns where I can make a composite key out of
those.
I have to index these two views now.


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 19, 2008 5:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Technically, no, a uniqueKey field is NOT required.  I've yet to run
into a situation where it made sense not to use one though.

As for indexing database tables - if one of your tables doesn't have

a

primary key, does it have an aggregate unique "key" of some sort?  Do
you plan on updating the rows in that table and reindexing them?
Seems like some kind of unique key would make sense for updating
documents.

But yeah, a more detailed description of your table structure and
searching needs would be helpful.

Erik


On Nov 19, 2008, at 5:18 AM, Aleksander M. Stensby wrote:


Yes it is. You need a unique id because the add method works as and
"add or update" method. When adding a document whose ID is already
found in the index, the old document will be deleted and the new
will be added. Are you indexing two tables into the same index? Or
does one entry in the index consist of data from both tables? How
are the

Re: Unique id

2008-11-21 Thread Aleksander M. Stensby
Ok, this brings me to the question; how are the two view's connected to  
each other (since you are indexing partly view 1 and partly view 2 into a  
single index structure?


If they are not at all connected I believe you have made a fundamental  
mistake / misunderstand the use of your index...
I assume that a Task can be assigned to a person, and your Team view  
displays that person, right?


Maybe you are doing something like this:
View 1
1, somename, sometimestamp, someothertimestamp
2, someothername, somethirdtimestamp, timetamp4
...

View 2
1, 58, 0
2, 58, 1
3, 52, 0
...

I'm really confused about your database structure...
To me, It would be logical to add a team_id field to the team table, and  
add a third table to link tasks to a team (or to individual persons).
Once you have that information (because I do assume there MUST be some  
link there) you would do:

insert into your index:
 (id from the task), (name of the task), (id of the person assigned to  
this task), (id of the departement that this person works in).


I guess that you _might_ be thinking a bit wrong and trying to do  
something like this:
Treat each view as independent views, and inserting values from each table  
as separate documents in the index

so you would do:
insert into your index:
 (id from the task), (name of the task), (no value), (no value)  which  
will be ok to do
 (no value), (no value), (id of the person), (id of the departement) ---  
which makes no sense to me...


So, can you clearify the relationship between the two views, and how you  
are thinking of inserting entries into your index?


- Aleks



On Fri, 21 Nov 2008 10:33:28 +0100, Raghunandan Rao  
<[EMAIL PROTECTED]> wrote:



View structure is:

1.
Task(id* (int), name (string), start (timestamp), end (timestamp))

2.
Team(person_id (int), deptId (int), isManager (int))

* is primary key

In schema.xml I have






id


-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: Friday, November 21, 2008 2:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Hello again. I'm getting a bit confused by your questions, and I believe

it would be easier for us to help you if you could post the field
definitions from your schema.xml and the structure of your two database

views.
ie.
table 1: (id (int), subject (string) -.--)
table 2: (category (string), other fields ..)


So please post this and we can try to help you.

- Aleks


On Fri, 21 Nov 2008 07:49:31 +0100, Raghunandan Rao
<[EMAIL PROTECTED]> wrote:


Thanks Erik.
If I convert that to a string then id field defined in schema.xml

would

fail as I have that as integer. If I change that to string then first
view would fail as it is Integer there. What to do in such scenarios?

Do

I need to define multiple schema.xml or multiple unique key

definitions

in same schema. How does this work? Pls explain.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 20, 2008 6:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

I'd suggest aggregating those three columns into a string that can
serve as the Solr uniqueKey field value.

Erik


On Nov 20, 2008, at 1:10 AM, Raghunandan Rao wrote:


Basically, I am working on two views. First one has an ID column. The
second view has no unique ID column. What to do in such situations?
There are 3 other columns where I can make a composite key out of
those.
I have to index these two views now.


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 19, 2008 5:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Technically, no, a uniqueKey field is NOT required.  I've yet to run
into a situation where it made sense not to use one though.

As for indexing database tables - if one of your tables doesn't have

a

primary key, does it have an aggregate unique "key" of some sort?  Do
you plan on updating the rows in that table and reindexing them?
Seems like some kind of unique key would make sense for updating
documents.

But yeah, a more detailed description of your table structure and
searching needs would be helpful.

        Erik


On Nov 19, 2008, at 5:18 AM, Aleksander M. Stensby wrote:


Yes it is. You need a unique id because the add method works as and
"add or update" method. When adding a document whose ID is already
found in the index, the old document will be deleted and the new
will be added. Are you indexing two tables into the same index? Or
does one entry in the index consist of data from both tables? How
are these linked together without an ID?

- Aleksander

On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao

<[EMAIL PROTECTED]

wrote:



Hi,

Is the uniqueKey in schema.xml really required?


Reason is, I am indexing two tables and I have id as unique key in
schema.xml but id field is not there in one of the tables and
indexing
fails. 

Re: solrQueryParser does not take effect - nightly build

2008-11-21 Thread Aleksander M. Stensby
That sounds a bit strange. Did you do the changes in the schema.xml before  
starting the server? Because if you change it while it is running, it will  
by default delete and replace the file (discarding any changes you make).  
In other words, make sure the server is not running, make your changes and  
then start up the server. Apart from that, I can't really see any reason  
for this to not work...


- Aleks


On Thu, 20 Nov 2008 22:03:30 +0100, ashokc <[EMAIL PROTECTED]> wrote:



Hi,

I have set



but it is not taking effect. It continues to take it as OR. I am working
with the latest nightly build 11/20/2008

For a querry like

term1 term2

Debug shows

content:term1 content:term2>/str>

Bug?

Thanks

- ashok





Re: Unique id

2008-11-21 Thread Aleksander M. Stensby
Hello again. I'm getting a bit confused by your questions, and I believe  
it would be easier for us to help you if you could post the field  
definitions from your schema.xml and the structure of your two database  
views.

ie.
table 1: (id (int), subject (string) -.--)
table 2: (category (string), other fields ..)


So please post this and we can try to help you.

- Aleks


On Fri, 21 Nov 2008 07:49:31 +0100, Raghunandan Rao  
<[EMAIL PROTECTED]> wrote:



Thanks Erik.
If I convert that to a string then id field defined in schema.xml would
fail as I have that as integer. If I change that to string then first
view would fail as it is Integer there. What to do in such scenarios? Do
I need to define multiple schema.xml or multiple unique key definitions
in same schema. How does this work? Pls explain.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 20, 2008 6:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

I'd suggest aggregating those three columns into a string that can
serve as the Solr uniqueKey field value.

Erik


On Nov 20, 2008, at 1:10 AM, Raghunandan Rao wrote:


Basically, I am working on two views. First one has an ID column. The
second view has no unique ID column. What to do in such situations?
There are 3 other columns where I can make a composite key out of
those.
I have to index these two views now.


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 19, 2008 5:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Technically, no, a uniqueKey field is NOT required.  I've yet to run
into a situation where it made sense not to use one though.

As for indexing database tables - if one of your tables doesn't have a
primary key, does it have an aggregate unique "key" of some sort?  Do
you plan on updating the rows in that table and reindexing them?
Seems like some kind of unique key would make sense for updating
documents.

But yeah, a more detailed description of your table structure and
searching needs would be helpful.

Erik


On Nov 19, 2008, at 5:18 AM, Aleksander M. Stensby wrote:


Yes it is. You need a unique id because the add method works as and
"add or update" method. When adding a document whose ID is already
found in the index, the old document will be deleted and the new
will be added. Are you indexing two tables into the same index? Or
does one entry in the index consist of data from both tables? How
are these linked together without an ID?

- Aleksander

On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao

<[EMAIL PROTECTED]

wrote:



Hi,

Is the uniqueKey in schema.xml really required?


Reason is, I am indexing two tables and I have id as unique key in
schema.xml but id field is not there in one of the tables and
indexing
fails. Do I really require this unique field for Solr to index it
better
or can I do away with this?


Thanks,

Rahgu





--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no




Re: Unique id

2008-11-19 Thread Aleksander M. Stensby
Ok, but how do you map your table structure to the index? As far as I can  
understand, the two tables have different structre, so why/how do you map  
two different datastructures onto a single index? Are the two tables  
connected in some way? If so, you could make your index structure reflect  
the union of both tables and just make one insertion into the index per  
entry of the two tables.


Maybe you could post the table structure so that I can get a better  
understanding of your use-case...


- Aleks

On Wed, 19 Nov 2008 11:25:56 +0100, Raghunandan Rao  
<[EMAIL PROTECTED]> wrote:



Ok got it.
I am indexing two tables differently. I am using Solrj to index with
@Field annotation. I make two queries initially and fetch the data from
two tables and index them separately. But what if the ids in two tables
are same? That means documents with same id will be deleted when doing
update.

How does this work? Please explain.

Thanks.

-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 19, 2008 3:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Yes it is. You need a unique id because the add method works as and "add

or update" method. When adding a document whose ID is already found in
the
index, the old document will be deleted and the new will be added. Are
you
indexing two tables into the same index? Or does one entry in the index

consist of data from both tables? How are these linked together without
an
ID?

- Aleksander

On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao
<[EMAIL PROTECTED]> wrote:


Hi,

Is the uniqueKey in schema.xml really required?


Reason is, I am indexing two tables and I have id as unique key in
schema.xml but id field is not there in one of the tables and indexing
fails. Do I really require this unique field for Solr to index it

better

or can I do away with this?


Thanks,

Rahgu






Re: Unique id

2008-11-19 Thread Aleksander M. Stensby
Yes it is. You need a unique id because the add method works as and "add  
or update" method. When adding a document whose ID is already found in the  
index, the old document will be deleted and the new will be added. Are you  
indexing two tables into the same index? Or does one entry in the index  
consist of data from both tables? How are these linked together without an  
ID?


- Aleksander

On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao  
<[EMAIL PROTECTED]> wrote:



Hi,

Is the uniqueKey in schema.xml really required?


Reason is, I am indexing two tables and I have id as unique key in
schema.xml but id field is not there in one of the tables and indexing
fails. Do I really require this unique field for Solr to index it better
or can I do away with this?


Thanks,

Rahgu





--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Use SOLR like the "MySQL LIKE"

2008-11-18 Thread Aleksander M. Stensby

Ah, okay!
Well, then I suggest you index the field in two different ways if you want  
both possible ways of searching. One, where you treat the entire name as  
one token (in lowercase) (then you can search for avera* and match on for  
instance "average joe" etc.) And then another field where you tokenize on  
whitespace for instance, if you want/need that possibility aswell. Look at  
the solr copy fields and try it out, it works like a charm :)


Cheers,
 Aleksander

On Tue, 18 Nov 2008 10:40:24 +0100, Carsten L <[EMAIL PROTECTED]> wrote:



Thanks for the quick reply!

It is supposed to work a little like the Google Suggest or field
autocompletion.

I know I mentioned email and userid, but the problem lies with the name
field, because of the whitespaces in combination with the wildcard.

I looked at the solr.WordDelimiterFilterFactory, but it does not mention
anything about whitespaces - or wildcards.

A quick brushup:
I would like to mimic the LIKE functionality from MySQL using the  
wildcards

in the end of the searchquery.
In MySQL whitespaces are treated as characters, not "splitters".


Aleksander M. Stensby wrote:


Hi there,

You should use LowerCaseTokenizerFactory as you point out yourself. As  
far
as I know, the StandardTokenizer "recognizes email addresses and  
internet

hostnames as one token". In your case, I guess you want an email, say
"[EMAIL PROTECTED]" to be split into four tokens: average joe  
apache

org, or something like that, which would indeed allow you to search for
"joe" or "average j*" and match. To do so, you could use the
WordDelimiterFilterFactory and split on intra-word delimiters (I think  
the

defaults here are non-alphanumeric chars).

Take a look at  
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

for more info on tokenizers and filters.

cheers,
  Aleks

On Tue, 18 Nov 2008 08:35:31 +0100, Carsten L <[EMAIL PROTECTED]>  
wrote:




Hello.

The data:
I have a dataset containing ~500.000 documents.
In each document there is an email, a name and an user ID.

The problem:
I would like to be able to search in it, but it should be like the  
"MySQL

LIKE".

So when a user enters the search term: "carsten", then the query looks
like:
"name:(carsten) OR name:(carsten*) OR email:(carsten) OR
email:(carsten*) OR userid:(carsten) OR userid:(carsten*)"

Then it should match:
carsten l
carsten larsen
Carsten Larsen
Carsten
CARSTEN
etc.

And when the user enters the term: "carsten l" the query looks like:
"name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR
email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)"

Then it should match:
carsten l
carsten larsen
Carsten Larsen

Or written to the MySQL syntax: "... WHERE `name` LIKE 'carsten%'  OR
`email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'..."

I know that I need to use the "solr.LowerCaseTokenizerFactory" on my  
name

and email field, to ensure case insentitive behavior.
The problem seems to be the wildcards and the whitespaces.




--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no








--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Use SOLR like the "MySQL LIKE"

2008-11-18 Thread Aleksander M. Stensby

Hi there,

You should use LowerCaseTokenizerFactory as you point out yourself. As far  
as I know, the StandardTokenizer "recognizes email addresses and internet  
hostnames as one token". In your case, I guess you want an email, say  
"[EMAIL PROTECTED]" to be split into four tokens: average joe apache  
org, or something like that, which would indeed allow you to search for  
"joe" or "average j*" and match. To do so, you could use the  
WordDelimiterFilterFactory and split on intra-word delimiters (I think the  
defaults here are non-alphanumeric chars).


Take a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters  
for more info on tokenizers and filters.


cheers,
 Aleks

On Tue, 18 Nov 2008 08:35:31 +0100, Carsten L <[EMAIL PROTECTED]> wrote:



Hello.

The data:
I have a dataset containing ~500.000 documents.
In each document there is an email, a name and an user ID.

The problem:
I would like to be able to search in it, but it should be like the "MySQL
LIKE".

So when a user enters the search term: "carsten", then the query looks  
like:

"name:(carsten) OR name:(carsten*) OR email:(carsten) OR
email:(carsten*) OR userid:(carsten) OR userid:(carsten*)"

Then it should match:
carsten l
carsten larsen
Carsten Larsen
Carsten
CARSTEN
etc.

And when the user enters the term: "carsten l" the query looks like:
"name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR
email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)"

Then it should match:
carsten l
carsten larsen
Carsten Larsen

Or written to the MySQL syntax: "... WHERE `name` LIKE 'carsten%'  OR
`email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'..."

I know that I need to use the "solr.LowerCaseTokenizerFactory" on my name
and email field, to ensure case insentitive behavior.
The problem seems to be the wildcards and the whitespaces.




--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Calculating peaks - solrj support for facet.date?

2008-11-13 Thread Aleksander M. Stensby

As Erik said, you can just set the parameters yourself
 SolrQuery query = new SolrQuery(...);
 query.set(FacetParams.FACET_DATE, ...);
 etc.

You'll find all facet-related parameters in the FacetParams interface,  
located in the org.apache.solr.common.params package.


- Aleks

On Fri, 07 Nov 2008 14:26:56 +0100, Erik Hatcher  
<[EMAIL PROTECTED]> wrote:




On Nov 7, 2008, at 7:23 AM, [EMAIL PROTECTED] wrote:
Sorry, but I have one more question. Does the java client solrj support  
facet.date?


Yeah, but it doesn't have explicit setters for it.  A SolrQuery is also  
a ModifiableSolrParams - so you can call the add/set methods on it using  
the same keys used with HTTP requests.


    Erik






--
Aleksander M. Stensby
Senior software developer
Integrasco A/S


Re: EmbeddedSolrServer and the MultiCore functionality

2008-09-24 Thread Aleksander M. Stensby

Okay, sounds fair.
Well, why I would have multiple shards was based on the presumption that  
it would be more effective to be able to search in single shards when  
needed (if each shard contains lets say 30 million entries) and then when  
time comes, migrate one of the shards to a different node. But I guess the  
gain in performance is not significant and that i should rather have just  
one shard per node. Or?


Best regards and thanks for your answer,
 Aleksander

On Tue, 23 Sep 2008 16:57:08 +0200, Ryan McKinley <[EMAIL PROTECTED]>  
wrote:




If i have solr up and running and do something like this:
   query.set("shards", "localhost:8080/solr/core0,localhost: 
8080/solr/core1");

I will get the results from both cores, obviously...

But is there a way to do this without using shards and accessing the  
cores through http?
I presume it would/should be possible to do the same thing directly  
against the cores, but my question is really if this has been  
implemented already / is it possible?




not implemented...

Check line 384 of SearchHandler.java
   SolrServer server = new CommonsHttpSolrServer(url, client);

it defaults to CommonsHttpSolrServer.

This could easily change to EmbeddedSolrServer, but i'm not sure it is a  
very common usecase...


why would you have multiple shards on the same machine?

ryan







--
Aleksander M. Stensby
Senior Software Developer
Integrasco A/S
+47 41 22 82 72
[EMAIL PROTECTED]


EmbeddedSolrServer and the MultiCore functionality

2008-09-23 Thread Aleksander M. Stensby
Hello everyone, I'm new to Solr (have been using Lucene for a few years  
now). We are looking into Solr and have heard many good things about the  
project:)


I have a few questions regarding the EmbeddedSolrServer in Solrj and the  
MultiCore features... I've tried to find answers to this in the archives  
but have not succeeded.
The thing is, I want to be able to use the Embedded server to access  
multiple cores on one machine, and I would like to at least have the  
possibility to access the lucene indexes without http. In particular I'm  
wondering if it is possible to do the "shards" (distributed search)  
approach using the embedded server, without using http requests.


lets say I register 2 cores to a container and init my embedded server  
like this:

CoreContainer container = new CoreContainer();
container.register("core1", core1, false);
container.register("core2", core2, false);
server = new EmbeddedSolrServer(container, "core1");
then queries performed on my server will return results from core1... and  
if i do ..=new EmbeddedSolrServer(container, "core2") the results will  
come from core2.


If i have solr up and running and do something like this:
query.set("shards",  
"localhost:8080/solr/core0,localhost:8080/solr/core1");

I will get the results from both cores, obviously...

But is there a way to do this without using shards and accessing the cores  
through http?
I presume it would/should be possible to do the same thing directly  
against the cores, but my question is really if this has been implemented  
already / is it possible?



Thanks in advance for any replies!

Best regards,
 Aleksander


--
Aleksander M. Stensby
Senior Software Developer
Integrasco A/S
+47 41 22 82 72
[EMAIL PROTECTED]