Re: Filter multivalue fields from search result

2010-07-12 Thread Alex J. G. Burzyński

Hi,

So if those are separate documents how should I handle paging? Two 
separate queries?
First to return all matching courses-events pairs, and second one to get 
courses for given page?


Is this common design described in details somewhere?

Thanks,
Alex

On 2010-07-09 01:50, Lance Norskog wrote:

Yes, denormalizing the index into separate (name,town) pairs is the
common design for this problem.

2010/7/8 Alex J. G. Burzyńskimailing-s...@ajgb.net:
   

Hi,

Is it possible to remove from search results the multivalued fields that
don't pass the search criteria?

My schema is defined as:

!-- course_id --
field name=id type=string indexed=true stored=true
required=true /
!-- course_name --
field name=name type=string indexed=true stored=true/
!-- events.event_town --
field name=town type=string indexed=true stored=true
multiValued=true/
!-- events.event_date --
field name=date type=tdate indexed=true stored=true
multiValued=true/

And example docs are:

++--+++
| id | name | town   | date   |
++--+++
| 1  | Microsoft Excel  | London | 2010-08-20 |
||  | Glasgow| 2010-08-24 |
||  | Leeds  | 2010-08-28 |
| 2  | Microsoft Word   | Aberdeen   | 2010-08-21 |
||  | Reading| 2010-08-25 |
||  | London | 2010-08-29 |
| 2  | Microsoft Powerpoint | Birmingham | 2010-08-22 |
||  | Leeds  | 2010-08-26 |
++--+++

so the query for q=name:Microsoft town:Leeds returns docs 1  3.

How would I remove London/Glasgow from doc 1 and Birmingham from doc 3?

Or is it that I should create separate doc for each name-event?

Thanks,
Alex

 



   


Re: Filter multivalue fields from search result

2010-07-12 Thread Alex J. G. Burzyński

Hi Chantal,

The paging problem I've asked about is that having course-event pairs 
and specifying rows limits the number of pairs returned not the courses


+---+--+++
| id-id | name | town   | date   |
+---+--+++
| 1-1   | Microsoft Excel  | London | 2010-08-20 |
| 1-2   | Microsoft Excel  | Glasgow| 2010-08-24 |
| 1-3   | Microsoft Excel  | Leeds  | 2010-08-28 |
| 2-1   | Microsoft Word   | Aberdeen   | 2010-08-21 |
| 2-2   | Microsoft Word   | Reading| 2010-08-25 |
| 2-3   | Microsoft Word   | London | 2010-08-29 |
| 3-1   | Microsoft Powerpoint | Birmingham | 2010-08-22 |
| 3-2   | Microsoft Powerpoint | Leeds  | 2010-08-26 |
| 3-3   | Microsoft Powerpoint | Leeds  | 2010-08-30 |
+---+--+++


And from UI point of view I'm returning less courses then events - 
that's why I've asked about paging.


The search for q=name:Microsoft town:Leeds with rows=2 should return:
1-3  3-2  3-3

But 3-3 will be obviously on page 2.

I hope that it makes my questions more clear.

Thanks,
Alex


On 2010-07-12 10:26, Chantal Ackermann wrote:

Hi Alex,

I think you have to explain the complete use case. Paging is done by
specifying the parameter start (and rows if you want to have more or
less than 10 hits per page). For each page you need of course a new
query, but the queries differ only in the parameter value start (first
page start=0, second page start=10 etc. if rows=10). The other
parameters remain the same.

You should also have a look at facets. They might help you to get a list
of the values of your multi valued fields that you can display in the
UI, allowing the user to drill down the results further.

Chantal

On Mon, 2010-07-12 at 10:26 +0200, Alex J. G. Burzyński wrote:
   

Hi,

So if those are separate documents how should I handle paging? Two
separate queries?
First to return all matching courses-events pairs, and second one to get
courses for given page?

Is this common design described in details somewhere?

Thanks,
Alex

On 2010-07-09 01:50, Lance Norskog wrote:
 

Yes, denormalizing the index into separate (name,town) pairs is the
common design for this problem.

2010/7/8 Alex J. G. Burzyńskimailing-s...@ajgb.net:

   

Hi,

Is it possible to remove from search results the multivalued fields that
don't pass the search criteria?

My schema is defined as:

!-- course_id --
field name=id type=string indexed=true stored=true
required=true /
!-- course_name --
field name=name type=string indexed=true stored=true/
!-- events.event_town --
field name=town type=string indexed=true stored=true
multiValued=true/
!-- events.event_date --
field name=date type=tdate indexed=true stored=true
multiValued=true/

And example docs are:

++--+++
| id | name | town   | date   |
++--+++
| 1  | Microsoft Excel  | London | 2010-08-20 |
||  | Glasgow| 2010-08-24 |
||  | Leeds  | 2010-08-28 |
| 2  | Microsoft Word   | Aberdeen   | 2010-08-21 |
||  | Reading| 2010-08-25 |
||  | London | 2010-08-29 |
| 2  | Microsoft Powerpoint | Birmingham | 2010-08-22 |
||  | Leeds  | 2010-08-26 |
++--+++

so the query for q=name:Microsoft town:Leeds returns docs 1   3.

How would I remove London/Glasgow from doc 1 and Birmingham from doc 3?

Or is it that I should create separate doc for each name-event?

Thanks,
Alex


 



   


   


Filter multivalue fields from search result

2010-07-08 Thread Alex J. G. Burzyński
Hi,

Is it possible to remove from search results the multivalued fields that
don't pass the search criteria?

My schema is defined as:

!-- course_id --
field name=id type=string indexed=true stored=true
required=true /
!-- course_name --
field name=name type=string indexed=true stored=true/
!-- events.event_town --
field name=town type=string indexed=true stored=true
multiValued=true/
!-- events.event_date --
field name=date type=tdate indexed=true stored=true
multiValued=true/

And example docs are:

++--+++
| id | name | town   | date   |
++--+++
| 1  | Microsoft Excel  | London | 2010-08-20 |
||  | Glasgow| 2010-08-24 |
||  | Leeds  | 2010-08-28 |
| 2  | Microsoft Word   | Aberdeen   | 2010-08-21 |
||  | Reading| 2010-08-25 |
||  | London | 2010-08-29 |
| 2  | Microsoft Powerpoint | Birmingham | 2010-08-22 |
||  | Leeds  | 2010-08-26 |
++--+++

so the query for q=name:Microsoft town:Leeds returns docs 1  3.

How would I remove London/Glasgow from doc 1 and Birmingham from doc 3?

Or is it that I should create separate doc for each name-event?

Thanks,
Alex


Solr Spellcheck on Large index size

2010-04-27 Thread Kyle J G

I am trying to create a spell checker for my companies website.

Currently there are approx 29million documents in the index.

When trying to create the spelling index it just seems to skip over the
command.

My fields in schema.xml look like the following:

field name=ID type=int indexed=true stored=true required=true / 
field name=LineCode type=string indexed=true stored=true
required=true /
field name=PartNumber type=string indexed=true stored=true
required=true / 
field name=CategoryName type=string indexed=true stored=true
required=true / 
field name=PartTerminologyName type=string indexed=true
stored=true required=true / 
field name=Year type=int indexed=true stored=true 
required=true
/ 
field name=Make type=string indexed=true stored=true
required=true / 
field name=Model type=string indexed=true stored=true
required=true / 
field name=Submodel type=string indexed=true stored=true / 
field name=EngType type=string indexed=true stored=true
required=true / 
field name=Liter type=string indexed=true stored=true
required=true / 
field name=CC type=int indexed=true stored=true 
required=true / 
field name=CID type=int indexed=true stored=true 
required=true
/ 
field name=Fuel type=string indexed=true stored=true
required=true / 
field name=FuelDel type=string indexed=true stored=true
required=true / 
field name=Asp type=string indexed=true stored=true
required=true / 
field name=EngVin type=string indexed=true stored=true
required=true / 
field name=EngDesg type=string indexed=true stored=true
required=true / 

And copying fields as such: 
   copyField source=Year dest=text/
   copyField source=Make dest=text/
   copyField source=Model dest=text/
   copyField source=Fuel dest=text/
   copyField source=CategoryName dest=text/
copyField source=text dest=spell/

My spell checker config looks like the following: 

searchComponent name=spellcheck class=solr.SpellCheckComponent

!-- str name=queryAnalyzerFieldTypetextSpell/str --

lst name=spellchecker
  str name=namedefault/str
  str name=fieldspell/str
  str name=buildOnCommittrue/str
  str name=buildOnOptimizetrue/str
  str
name=spellcheckIndexDirC:\Users\kyleg\apache-solr-1.4.0\productGroups\solr\data\spellchecker/str
/lst

!-- a spellchecker that uses a different distance measure
lst name=spellchecker
  str name=namejarowinkler/str
  str name=fieldspell/str
  str
name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
  str name=spellcheckIndexDir./spellchecker2/str
/lst
 --

!-- a file based spell checker --
lst name=spellchecker
  str name=classnamesolr.FileBasedSpellChecker/str
  str name=namefile/str
  str name=sourceLocationspellings.txt/str
  str name=characterEncodingUTF-8/str
  str name=spellcheckIndexDir./spellcheckerFile/str
/lst
  /searchComponent


The command that I am sending to try to build looks like the following:
http://localhost:8983/solr/spell/?q=ACORAversion=2.2start=0rows=10indent=onspellcheck=truespellcheck.dictionary=defaultspellcheck.build=truespellcheck.collate=truespellcheck.limit=5


I have also tried to reduce the size of the index to around 10,000 documents
and still no luck.

Any help would be appreciated.

Thank you,
Kyle
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Spellcheck-on-Large-index-size-tp760416p760416.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr Replication

2009-08-27 Thread J G

We have multiple solr webapps all running from the same WAR file. Each webapp 
is running under the same Tomcat container and I consider each webapp the same 
thing as a slice (or instance). I've configured the Tomcat container to 
enable JMX and when I connect using JConsole I only see the replication handler 
for one of the webapps in the server. I was under the impression each webapp 
gets its own replication handler. Is this not true? 

It would be nice to be able to have a JMX MBean for each replication handler in 
the container so we can get all the same replication information using JMX as 
in using the replication admin page for each web app.

Thanks.





 From: noble.p...@corp.aol.com
 Date: Thu, 27 Aug 2009 13:04:38 +0530
 Subject: Re: Solr Replication
 To: solr-user@lucene.apache.org
 
 when you say a slice you mean one instance of solr? So your JMX
 console is connecting to only one solr?
 
 On Thu, Aug 27, 2009 at 3:19 AM, J Gskinny_joe...@hotmail.com wrote:
 
  Thanks for the response.
 
  It's interesting because when I run jconsole all I can see is one 
  ReplicationHandler jmx mbean. It looks like it is defaulting to the first 
  slice it finds on its path. Is there anyway to have multiple replication 
  handlers or at least obtain replication on a per slice/instance via JMX 
  like how you can see attributes for each slice/instance via each 
  replication admin jsp page?
 
  Thanks again.
 
  From: noble.p...@corp.aol.com
  Date: Wed, 26 Aug 2009 11:05:34 +0530
  Subject: Re: Solr Replication
  To: solr-user@lucene.apache.org
 
  The ReplicationHandler is not enforced as a singleton , but for all
  practical purposes it is a singleton for one core.
 
  If an instance  (a slice as you say) is setup as a repeater, It can
  act as both a master and slave
 
  in the repeater the configuration should be as follows
 
  MASTER
|_SLAVE (I am a slave of MASTER)
|
  REPEATER (I am a slave of MASTER and master to my slaves )
   |
   |
  REPEATER_SLAVE( of REPEATER)
 
 
  the point is that REPEATER will have a slave section has a masterUrl
  which points to master and REPEATER_SLAVE will have a slave section
  which has a masterurl pointing to repeater
 
 
 
 
 
 
  On Wed, Aug 26, 2009 at 12:40 AM, J Gskinny_joe...@hotmail.com wrote:
  
   Hello,
  
   We are running multiple slices in our environment. I have enabled JMX 
   and I am inspecting the replication handler mbean to obtain some 
   information about the master/slave configuration for replication. Is the 
   replication handler mbean a singleton? I only see one mbean for the 
   entire server and it's picking an arbitrary slice to report on. So I'm 
   curious if every slice gets its own replication handler mbean? This is 
   important because I have no way of knowing in this specific server any 
   information about the other slices, in particular, information about the 
   master/slave value for the other slices.
  
   Reading through the Solr 1.4 replication strategy, I saw that a slice 
   can be configured to be a master and a slave, i.e. a repeater. I'm 
   wondering how repeaters work because let's say I have a slice named 'A' 
   and the master is on server 1 and the slave is on server 2 then how are 
   these two servers communicating to replicate? Looking at the jmx 
   information I have in the MBean both the isSlave and isMaster is set to 
   true for my repeater so how does this solr slice know if it's the master 
   or slave? I'm a bit confused.
  
   Thanks.
  
  
  
  
   _
   With Windows Live, you can organize, edit, and share your photos.
   http://www.windowslive.com/Desktop/PhotoGallery
 
 
 
  --
  -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 
  _
  Hotmail® is up to 70% faster. Now good news travels really fast.
  http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009
 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

_
With Windows Live, you can organize, edit, and share your photos.
http://www.windowslive.com/Desktop/PhotoGallery

RE: Solr Replication

2009-08-26 Thread J G

Thanks for the response.

It's interesting because when I run jconsole all I can see is one 
ReplicationHandler jmx mbean. It looks like it is defaulting to the first slice 
it finds on its path. Is there anyway to have multiple replication handlers or 
at least obtain replication on a per slice/instance via JMX like how you 
can see attributes for each slice/instance via each replication admin jsp 
page? 

Thanks again.

 From: noble.p...@corp.aol.com
 Date: Wed, 26 Aug 2009 11:05:34 +0530
 Subject: Re: Solr Replication
 To: solr-user@lucene.apache.org
 
 The ReplicationHandler is not enforced as a singleton , but for all
 practical purposes it is a singleton for one core.
 
 If an instance  (a slice as you say) is setup as a repeater, It can
 act as both a master and slave
 
 in the repeater the configuration should be as follows
 
 MASTER
   |_SLAVE (I am a slave of MASTER)
   |
 REPEATER (I am a slave of MASTER and master to my slaves )
  |
  |
 REPEATER_SLAVE( of REPEATER)
 
 
 the point is that REPEATER will have a slave section has a masterUrl
 which points to master and REPEATER_SLAVE will have a slave section
 which has a masterurl pointing to repeater
 
 
 
 
 
 
 On Wed, Aug 26, 2009 at 12:40 AM, J Gskinny_joe...@hotmail.com wrote:
 
  Hello,
 
  We are running multiple slices in our environment. I have enabled JMX and I 
  am inspecting the replication handler mbean to obtain some information 
  about the master/slave configuration for replication. Is the replication 
  handler mbean a singleton? I only see one mbean for the entire server and 
  it's picking an arbitrary slice to report on. So I'm curious if every slice 
  gets its own replication handler mbean? This is important because I have no 
  way of knowing in this specific server any information about the other 
  slices, in particular, information about the master/slave value for the 
  other slices.
 
  Reading through the Solr 1.4 replication strategy, I saw that a slice can 
  be configured to be a master and a slave, i.e. a repeater. I'm wondering 
  how repeaters work because let's say I have a slice named 'A' and the 
  master is on server 1 and the slave is on server 2 then how are these two 
  servers communicating to replicate? Looking at the jmx information I have 
  in the MBean both the isSlave and isMaster is set to true for my repeater 
  so how does this solr slice know if it's the master or slave? I'm a bit 
  confused.
 
  Thanks.
 
 
 
 
  _
  With Windows Live, you can organize, edit, and share your photos.
  http://www.windowslive.com/Desktop/PhotoGallery
 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

_
Hotmail® is up to 70% faster. Now good news travels really fast. 
http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009

Solr Replication

2009-08-25 Thread J G

Hello,

We are running multiple slices in our environment. I have enabled JMX and I am 
inspecting the replication handler mbean to obtain some information about the 
master/slave configuration for replication. Is the replication handler mbean a 
singleton? I only see one mbean for the entire server and it's picking an 
arbitrary slice to report on. So I'm curious if every slice gets its own 
replication handler mbean? This is important because I have no way of knowing 
in this specific server any information about the other slices, in particular, 
information about the master/slave value for the other slices.

Reading through the Solr 1.4 replication strategy, I saw that a slice can be 
configured to be a master and a slave, i.e. a repeater. I'm wondering how 
repeaters work because let's say I have a slice named 'A' and the master is on 
server 1 and the slave is on server 2 then how are these two servers 
communicating to replicate? Looking at the jmx information I have in the MBean 
both the isSlave and isMaster is set to true for my repeater so how does this 
solr slice know if it's the master or slave? I'm a bit confused.

Thanks.




_
With Windows Live, you can organize, edit, and share your photos.
http://www.windowslive.com/Desktop/PhotoGallery

Obtaining SOLR index size on disk

2009-07-17 Thread J G

Hello,

Is it possible to obtain the SOLR index size on disk through the SOLR API? I've 
read through the docs and mailing list questions but can't seem to find the 
answer.

Any help is appreciated.

Thanks.



_
Hotmail® has ever-growing storage! Don’t worry about storage limits. 
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage_062009

JMX monitoring for multiple SOLR instances

2009-07-14 Thread J G

Hi,

If I want to run multiple SOLR war files in tomcat is it possible to monitor 
each of the SOLR instances individually through JMX? Has anyone attempted this 
before? Also, what are the implications (e.g. performance) of runnign mulitple 
SOLR instances in the same tomcat server?

Thanks.




_
Windows Live™: Keep your life in sync. 
http://windowslive.com/explore?ocid=TXT_TAGLM_WL_BR_life_in_synch_062009

solr jmx connection

2009-07-10 Thread J G

 Hello,

I have a SOLR JMX connection issue. I am running my JMX MBeanServer through 
Tomcat, meaning I am using Tomcat's MBeanServer rather than any other 
MBeanServer implemenation.
I am having a hard time trying to figure out the correct JMX Service URL on my 
localhost for the accessing the SOLR MBeans. My current configuration consists 
of the following:

JMX Service url = localhost:9000/jmxrmi

So I have configured JMX to run on port 9000 on tomcat on my localhost and 
using the above service url i can access the tomcat jmx MBeanServer and get 
related JVM object information(e.g. I can access the MemoryMXBean object)

However, I am having a harder time trying to access the SOLR MBeans. First, I 
could have the wrong service URL. Second, I'm confused as to which MBeans SOLR 
provides.

You might be asking why am I creating my own client rather than using JConsole, 
but JConsole doesn't provide the features I need.

Anyone with any knowledge or code snippets would be a huge help!

Thank you for your time!

Regards



_
Hotmail® has ever-growing storage! Don’t worry about storage limits. 
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage_062009