Re: Tag Cloud Generation Problem

2010-04-08 Thread Markus Jelsma
The facetting engine can do this job.



On Thursday 08 April 2010 10:16:09 Ninad Raut wrote:
 Hi,
 
 I have a business use case where in I have to generate a tagcloud for words
 with freequency greater than a specified threshold.
 
 The way I store records in solr is :
 For every solr document (which includes content) I store mutlivalued entry
 of buzzwords with  their frequency.
 
 The technical problem I face is :
 While generating a tag cloud I donot know the buzzwords before hand.
 Morover I want the frequecy total for a Buzzword across documents.
 
 In SQL the way to do is:
 
 Select buzzWord, sum(frequency)
 from Verbatim
 where count(frequency)thresholdValue
 group by  buzzWord
 
 Is there a similar way I can query a SOLR. Even a workaround solution to
 this will do.
 
 
 Thanks.
 
 Regards,
 Ninad R
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Tag Cloud Generation Problem

2010-04-08 Thread Markus Jelsma
Hi,


It's simpler than you might think :)

?q=*:*facet=truefacet.field=buzzWordrows=0

This will retrieve an overall facet count (useful for navigation and tag cloud 
generation) but doesn't return the documents themselves. Check the facetting 
wiki [1] for more information.



[1]: http://wiki.apache.org/solr/SimpleFacetParameters


Cheers




On Thursday 08 April 2010 10:47:19 Ninad Raut wrote:
 Hi Markus,
 
 But the problem is, we donot know the words before hand. What will be the
 facet Query be?
 
 If you can just explain me with an example it would be really nice of you.
 
 Regards,
 Ninad R
 
 On Thu, Apr 8, 2010 at 2:09 PM, Markus Jelsma mar...@buyways.nl wrote:
  The facetting engine can do this job.
 
  On Thursday 08 April 2010 10:16:09 Ninad Raut wrote:
   Hi,
  
   I have a business use case where in I have to generate a tagcloud for
 
  words
 
   with freequency greater than a specified threshold.
  
   The way I store records in solr is :
   For every solr document (which includes content) I store mutlivalued
 
  entry
 
   of buzzwords with  their frequency.
  
   The technical problem I face is :
   While generating a tag cloud I donot know the buzzwords before hand.
   Morover I want the frequecy total for a Buzzword across documents.
  
   In SQL the way to do is:
  
   Select buzzWord, sum(frequency)
   from Verbatim
   where count(frequency)thresholdValue
   group by  buzzWord
  
   Is there a similar way I can query a SOLR. Even a workaround solution
   to this will do.
  
  
   Thanks.
  
   Regards,
   Ninad R
 
  Markus Jelsma - Technisch Architect - Buyways BV
  http://www.linkedin.com/in/markus17
  050-8536620 / 06-50258350
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



RE: Re: Using Solr with CouchDB

2010-04-28 Thread Markus Jelsma
Hi,

 

 

Setting up CouchDB-Lucene is quite easy, but you don't want that i guess. You 
could construct a show function to convert input to Solr accepted XML, should 
be very straightforward. You just need some program to fetch from CouchDB and 
push it in Solr.

 

Cheers,
 
-Original message-
From: Patrick Petermair patrick.peterm...@openforce.com
Sent: Wed 28-04-2010 17:45
To: solr-user@lucene.apache.org; 
Subject: Re: Using Solr with CouchDB

Hey Brendan!

Thanks for your response.

 I don't know much about couch, but if you to return json from solr
 (which I think couch would understand) you can do that with wt=json
 in the query string when querying solr. See here for more details:
 http://wiki.apache.org/solr/SolJSON

Actually I'm looking for the other way around. I'm trying to get Solr to 
index my CouchDB. CouchDB works with a REST API and returns plaintext JSON.
So I'm looking to get JSON into Solr and not out of :)

On the CouchDB wiki I've found a reference to a project CouchDB Solr2 
which seemed to do exactly what I'm trying to do (full text indexing and 
searching with CouchDB), but it is no longer maintained as of January 
2009 and cannot be found anymore on github. Maybe it's because there is 
now a simple way to do it in Solr and I just haven't found it yet ;)

Patrick



RE: Re: Using Solr with CouchDB

2010-04-28 Thread Markus Jelsma
Whether you need Solr depends on if you require some features such as 
highlighting, faceting, more-like-this etc. They will not work with 
CouchDB-Lucene, nor can you, at this moment, use CoucDB-Lucene behind 
CouchDB-Lounge although a seperate shard can have a sharded Lucene index, you 
cannot query them through smartproxyd.

 

You need to know what you want to do with fulltext-search before choosing and 
join CouchDB's mailinglist if you haven't already.
 
-Original message-
From: Patrick Petermair patrick.peterm...@openforce.com
Sent: Wed 28-04-2010 18:03
To: solr-user@lucene.apache.org; 
Subject: Re: Using Solr with CouchDB


 Setting up CouchDB-Lucene is quite easy, but you don't want that i
 guess.

Yeah, I was thinking about CouchDB-Lucene too (also found it in the 
CouchDB wiki). It's not like I HAVE to make it work with Solr. If it 
turns out that it's not possible or a pain in the ass, I'll probably go 
for the easy way with CouchDB-Lucene.

Patrick


RE: schema.xml question

2010-05-07 Thread Markus Jelsma
You could write your own requestHandler in solrconfig.xml, it'll allow you to 
predefine parameters for your configured search components.
 
-Original message-
From: Antonello Mangone antonello.mang...@gmail.com
Sent: Fri 07-05-2010 15:17
To: solr-user@lucene.apache.org; 
Subject: schema.xml question

Hello everyone, my question is 
Is it possible in schema.xml set a group of fields to use as a default field
to query in OR or  in AND ???

example:

group name=group_name
   field name=a type=. /
   field name=b type=. /
   field name=c type=. /
/group

defaultSearchFieldgroup_name/defaultSearchField

Thanks in advance


RE: Re: schema.xml question

2010-05-07 Thread Markus Jelsma
A requestHandler works as an URL that can have predefined parameters. By 
default you will be querying the /select/ requestHandler. It, for instance, 
predefines the default number of rows to return (10) and returns all fields of 
a document (*).

 

requestHandler name=standard class=solr.SearchHandler default=true !-- 
default values for query parameters -- lst name=defaults str 
name=echoParamsexplicit/str !-- int name=rows10/int str 
name=fl*/str str name=version2.1/str -- /lst /requestHandler 

 

But you can also define more complex requestHandlers. The default configuration 
adds the dismax requestHandler (/dismax/) but it's actually the same as the 
default requestHandler if you would define all those configured parameters in 
your URL. So by defining the parameters in the solrconfig.xml, you won't need 
to pass them in you query. You can of course override predefined parameters 
with the exception of parameters defined inside an invariants block.

 

Check the documentation [1] on this subject but i would suggest you study the 
shipped solrconfig.xml [2] configuration file, it offers better explanation of 
the subject.

 

[1]: http://wiki.apache.org/solr/SolrConfigXml

[2]: 
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml

 

 

Cheers,
 
-Original message-
From: Antonello Mangone antonello.mang...@gmail.com
Sent: Fri 07-05-2010 15:26
To: solr-user@lucene.apache.org; 
Subject: Re: schema.xml question

For the moment I don't know how to do it, but I'll follow your suggestion :)
Thank you very much ...
ps. I'm just a novel

2010/5/7 Markus Jelsma markus.jel...@buyways.nl

 You could write your own requestHandler in solrconfig.xml, it'll allow you
 to predefine parameters for your configured search components.

 -Original message-
 From: Antonello Mangone antonello.mang...@gmail.com
 Sent: Fri 07-05-2010 15:17
 To: solr-user@lucene.apache.org;
 Subject: schema.xml question

 Hello everyone, my question is 
 Is it possible in schema.xml set a group of fields to use as a default
 field
 to query in OR or  in AND ???

 example:

 group name=group_name
    field name=a type=. /
    field name=b type=. /
    field name=c type=. /
 /group

 defaultSearchFieldgroup_name/defaultSearchField

 Thanks in advance


 


RE: Re: schema.xml question

2010-05-07 Thread Markus Jelsma
I forgot, there is actually a proper wiki page on this subject:

http://wiki.apache.org/solr/SolrRequestHandler

 


 
-Original message-
From: Antonello Mangone antonello.mang...@gmail.com
Sent: Fri 07-05-2010 15:26
To: solr-user@lucene.apache.org; 
Subject: Re: schema.xml question

For the moment I don't know how to do it, but I'll follow your suggestion :)
Thank you very much ...
ps. I'm just a novel

2010/5/7 Markus Jelsma markus.jel...@buyways.nl

 You could write your own requestHandler in solrconfig.xml, it'll allow you
 to predefine parameters for your configured search components.

 -Original message-
 From: Antonello Mangone antonello.mang...@gmail.com
 Sent: Fri 07-05-2010 15:17
 To: solr-user@lucene.apache.org;
 Subject: schema.xml question

 Hello everyone, my question is 
 Is it possible in schema.xml set a group of fields to use as a default
 field
 to query in OR or  in AND ???

 example:

 group name=group_name
    field name=a type=. /
    field name=b type=. /
    field name=c type=. /
 /group

 defaultSearchFieldgroup_name/defaultSearchField

 Thanks in advance



RE: Help indexing PDF files

2010-05-07 Thread Markus Jelsma
Hi,

 

 

The wiki page [1] on this subject will get you started.

 

[1]: http://wiki.apache.org/solr/ExtractingRequestHandler

 

 

Cheers
 
-Original message-
From: Leonardo Azize Martins laz...@gmail.com
Sent: Fri 07-05-2010 15:37
To: solr-user@lucene.apache.org; 
Subject: Help indexing PDF files

Hi,

I am new in Solr.
I would like to index some PDF files.

How can I do using example schema from 1.4.0 version?

Regards,
Leo


RE: Re: Help indexing PDF files

2010-05-07 Thread Markus Jelsma
You don't need it, you can use any PDF file.
 
-Original message-
From: Leonardo Azize Martins laz...@gmail.com
Sent: Fri 07-05-2010 15:45
To: solr-user@lucene.apache.org; 
Subject: Re: Help indexing PDF files

I am using this page, but in my downloaded version there is no site
directory.

Thanks

2010/5/7 Markus Jelsma markus.jel...@buyways.nl

 Hi,





 The wiki page [1] on this subject will get you started.



 [1]: http://wiki.apache.org/solr/ExtractingRequestHandler





 Cheers

 -Original message-
 From: Leonardo Azize Martins laz...@gmail.com
 Sent: Fri 07-05-2010 15:37
 To: solr-user@lucene.apache.org;
 Subject: Help indexing PDF files

 Hi,

 I am new in Solr.
 I would like to index some PDF files.

 How can I do using example schema from 1.4.0 version?

 Regards,
 Leo



RE: How to query for similar documents before indexing

2010-05-10 Thread Markus Jelsma
Hi,

 

 

Deduplication [1] is what you're looking for.It can utilize different analyzers 
that will add a one or more signatures or hashes to your document depending on 
exact or partial matches for configurable fields. Based on that, it should be 
able to prevent new documents from entering the index. 

 

The first part works very well but i have some issues with removing those 
documents on which i also need to check with the community tomorrow back at 
work ;-)

 

 

[1]: http://wiki.apache.org/solr/Deduplication

 

Cheers,


 
-Original message-
From: Matthieu Labour matthieu_lab...@yahoo.com
Sent: Mon 10-05-2010 22:41
To: solr-user@lucene.apache.org; 
Subject: How to query for similar documents before indexing

Hi

I want to implement the following logic:

Before I index a new document into the index, I want to check if there are 
already documents in the index with similar content to the content of the 
document about to be inserted. If the request returns 1 or more documents, then 
I don't want to insert the document.

What is the best way to achieve the above functionality ?

I read about Fuzzy searches in logic. But can I really build a request such as 
mydoc.title:wordexample~ AND mydoc.content:( all the content words)~0.9 ?

Thank you for your help




     
 


RE: How to query for similar documents before indexing

2010-05-10 Thread Markus Jelsma
Hi Matthieu,

 

 

On the top of the wiki page you can see it's in 1.4 already. As far as i know 
the API doesn't return information on found duplicates in its response header, 
the wiki isn't clear on that subject. I, at least, never saw any other response 
than an error or the usual status code and QTime.

 

Perhaps it would be a nice feature. On the other hand, you can also have a 
manual process that finds duplicates based on that signature and gather that 
information yourself as long as such a feature isn't there.

 

 

Cheers,


 
-Original message-
From: Matthieu Labour matthieu_lab...@yahoo.com
Sent: Mon 10-05-2010 23:30
To: solr-user@lucene.apache.org; 
Subject: RE: How to query for similar documents before indexing

Markus
Thank you for your response
That would be great if the index has the option to prevent duplicate from 
entering the index. But is it going to be a silent action ? Or will the add 
method return that it failed indexing because it detected a duplicate ?
Is it commited to the 1.4 already ?
Cheers
matt


--- On Mon, 5/10/10, Markus Jelsma markus.jel...@buyways.nl wrote:

From: Markus Jelsma markus.jel...@buyways.nl
Subject: RE: How to query for similar documents before indexing
To: solr-user@lucene.apache.org
Date: Monday, May 10, 2010, 4:11 PM

Hi,

 

 

Deduplication [1] is what you're looking for.It can utilize different analyzers 
that will add a one or more signatures or hashes to your document depending on 
exact or partial matches for configurable fields. Based on that, it should be 
able to prevent new documents from entering the index. 

 

The first part works very well but i have some issues with removing those 
documents on which i also need to check with the community tomorrow back at 
work ;-)

 

 

[1]: http://wiki.apache.org/solr/Deduplication


 

Cheers,


 
-Original message-
From: Matthieu Labour matthieu_lab...@yahoo.com
Sent: Mon 10-05-2010 22:41
To: solr-user@lucene.apache.org; 
Subject: How to query for similar documents before indexing

Hi

I want to implement the following logic:

Before I index a new document into the index, I want to check if there are 
already documents in the index with similar content to the content of the 
document about to be inserted. If the request returns 1 or more documents, then 
I don't want to insert the document.

What is the best way to achieve the above functionality ?

I read about Fuzzy searches in logic. But can I really build a request such as 
mydoc.title:wordexample~ AND mydoc.content:( all the content words)~0.9 ?

Thank you for your help




     
 



      

Dedupe and overwriteDupes setting

2010-05-11 Thread Markus Jelsma
List,


I've stumbled upon an issue with the deduplication mechanism. It either 
deletes all documents or does nothing at all and it depends on the 
overwriteDupes setting, resp. true and false.

I use a slightly modified configuration:

  updateRequestProcessorChain name=dedupe
processor 
class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
  bool name=enabledtrue/bool
  str name=signatureFieldsig/str
  bool name=overwriteDupestrue/bool
  str name=fieldscontent/str
  str 
name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str
/processor
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain


field name=sig type=string stored=true indexed=false 
multiValued=true /

After importing new documents i (only with overwriteDupes=false) can clearly 
see the correct signatures. Most documents have a distinct signature and some 
share the same because the content field's value is identical for those 
documents.


Anyway, why does it delete all my documents? Any clues? The wiki is not very 
helpful on this subject.


Cheers.


Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Dedupe and overwriteDupes setting

2010-05-11 Thread Markus Jelsma
It seems this e-mail did already leave the outbox yesterday. Apologies for the 
spam.


On Tuesday 11 May 2010 10:13:18 Markus Jelsma wrote:
 List,
 
 
 I've stumbled upon an issue with the deduplication mechanism. It either
 deletes all documents or does nothing at all and it depends on the
 overwriteDupes setting, resp. true and false.
 
 I use a slightly modified configuration:
 
   updateRequestProcessorChain name=dedupe
 processor
 class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
   bool name=enabledtrue/bool
   str name=signatureFieldsig/str
   bool name=overwriteDupestrue/bool
   str name=fieldscontent/str
   str
 name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/st
 r /processor
 processor class=solr.LogUpdateProcessorFactory /
 processor class=solr.RunUpdateProcessorFactory /
   /updateRequestProcessorChain
 
 
 field name=sig type=string stored=true indexed=false
 multiValued=true /
 
 After importing new documents i (only with overwriteDupes=false) can
  clearly see the correct signatures. Most documents have a distinct
  signature and some share the same because the content field's value is
  identical for those documents.
 
 
 Anyway, why does it delete all my documents? Any clues? The wiki is not
  very helpful on this subject.
 
 
 Cheers.
 
 
 Markus Jelsma - Technisch Architect - Buyways BV
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: How to query for similar documents before indexing

2010-05-11 Thread Markus Jelsma
If you set overwriteDupes = false the exact or near duplicate documents will 
not be deleted. The signature field is set, however, so you can later query 
yourself for duplicates in an external program and do whatever you want with 
the duplicates.


On Tuesday 11 May 2010 15:41:33 Matthieu Labour wrote:
 Hi Markus
 
 Thank you for your answer
 
 Here is a use case where I think it would be nice to know there is a dup
  before I insert it.
 
 Let's say I create a summary out of the document and I only index the
  summary and store the document itself on a separate device (S3, Cassandra
  etc ...). Then I would need that addDocument on the summary failed because
  it detected a duplicate so that I don't neet to store the document. 
 When you write:
 On the other hand, you can also have a manual process that finds
 duplicates based on that signature and gather that information yourself
 as long as such a feature isn't there.
 
 Can you explain more what you have in mind ?
 
 Thank you for your help!
 
 matt
 
 --- On Mon, 5/10/10, Markus Jelsma markus.jel...@buyways.nl wrote:
 
 From: Markus Jelsma markus.jel...@buyways.nl
 Subject: RE: How to query for similar documents before indexing
 To: solr-user@lucene.apache.org
 Date: Monday, May 10, 2010, 5:07 PM
 
 Hi Matthieu,
 
  
 
  
 
 On the top of the wiki page you can see it's in 1.4 already. As far as i
  know the API doesn't return information on found duplicates in its
  response header, the wiki isn't clear on that subject. I, at least, never
  saw any other response than an error or the usual status code and QTime.
 
  
 
 Perhaps it would be a nice feature. On the other hand, you can also have a
  manual process that finds duplicates based on that signature and gather
  that information yourself as long as such a feature isn't there.
 
  
 
  
 
 Cheers,
 
 
  
 -Original message-
 From: Matthieu Labour matthieu_lab...@yahoo.com
 Sent: Mon 10-05-2010 23:30
 To: solr-user@lucene.apache.org;
 Subject: RE: How to query for similar documents before indexing
 
 Markus
 Thank you for your response
 That would be great if the index has the option to prevent duplicate from
  entering the index. But is it going to be a silent action ? Or will the
  add method return that it failed indexing because it detected a duplicate
  ? Is it commited to the 1.4 already ?
 Cheers
 matt
 
 
 --- On Mon, 5/10/10, Markus Jelsma markus.jel...@buyways.nl wrote:
 
 From: Markus Jelsma markus.jel...@buyways.nl
 Subject: RE: How to query for similar documents before indexing
 To: solr-user@lucene.apache.org
 Date: Monday, May 10, 2010, 4:11 PM
 
 Hi,
 
  
 
  
 
 Deduplication [1] is what you're looking for.It can utilize different
  analyzers that will add a one or more signatures or hashes to your
  document depending on exact or partial matches for configurable fields.
  Based on that, it should be able to prevent new documents from entering
  the index.
 
  
 
 The first part works very well but i have some issues with removing those
  documents on which i also need to check with the community tomorrow back
  at work ;-)
 
  
 
  
 
 [1]: http://wiki.apache.org/solr/Deduplication
 
 
 
  
 
 Cheers,
 
 
  
 -Original message-
 From: Matthieu Labour matthieu_lab...@yahoo.com
 Sent: Mon 10-05-2010 22:41
 To: solr-user@lucene.apache.org;
 Subject: How to query for similar documents before indexing
 
 Hi
 
 I want to implement the following logic:
 
 Before I index a new document into the index, I want to check if there are
  already documents in the index with similar content to the content of the
  document about to be inserted. If the request returns 1 or more documents,
  then I don't want to insert the document.
 
 What is the best way to achieve the above functionality ?
 
 I read about Fuzzy searches in logic. But can I really build a request such
  as mydoc.title:wordexample~ AND mydoc.content:( all the content words)~0.9
  ?
 
 Thank you for your help
 
 
 
 
  
  
 
 
 
  
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



RE: Re: Dedupe and overwriteDupes setting

2010-05-11 Thread Markus Jelsma
Thanks Mark,

 

 

I already fixed it in the meantime and quickly went on with the usual stuff, i 
know, bad me =). I'll file a Jira report tomorrow and update the wiki on this 
subject. I'll can also file another ticket from another current topic on this 
subject; that's about a proper use-case for the update handler to return 
information on which documents where rejected due to dedupe.

 

I would like to think that updating the wiki with links to those new Jira 
tickets would be a good idea for other readers, is it not?

 

 

Cheers,
 
-Original message-
From: Mark Miller markrmil...@gmail.com
Sent: Tue 11-05-2010 17:25
To: solr-user@lucene.apache.org; 
Subject: Re: Dedupe and overwriteDupes setting

1. You need to set the sig field to indexed.
2. This should be added to the wiki
3. Want to make a JIRA issue? This is not very friendly behavior (when 
you have the sig field set to indexed=false and overwriteDupes=true it 
should likely complain)



-- 
- Mark

http://www.lucidimagination.com


On 5/11/10 4:13 AM, Markus Jelsma wrote:
 List,


 I've stumbled upon an issue with the deduplication mechanism. It either
 deletes all documents or does nothing at all and it depends on the
 overwriteDupes setting, resp. true and false.

 I use a slightly modified configuration:

    updateRequestProcessorChain name=dedupe
      processor
 class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
        bool name=enabledtrue/bool
        str name=signatureFieldsig/str
        bool name=overwriteDupestrue/bool
        str name=fieldscontent/str
        str
 name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str
      /processor
      processor class=solr.LogUpdateProcessorFactory /
      processor class=solr.RunUpdateProcessorFactory /
    /updateRequestProcessorChain


          field name=sig type=string stored=true indexed=false
 multiValued=true /

 After importing new documents i (only with overwriteDupes=false) can clearly
 see the correct signatures. Most documents have a distinct signature and some
 share the same because the content field's value is identical for those
 documents.


 Anyway, why does it delete all my documents? Any clues? The wiki is not very
 helpful on this subject.


 Cheers.


 Markus Jelsma - Technisch Architect - Buyways BV
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350



RE: Config issue for deduplication

2010-05-13 Thread Markus Jelsma
What's your solrconfig? No deduplication is overwritesDedupes = false and 
signature field is other than doc ID field (unique) 
 
-Original message-
From: Markus Fischer i...@flyingfischer.ch
Sent: Thu 13-05-2010 17:01
To: solr-user@lucene.apache.org; 
Subject: Config issue for deduplication

I am trying to configure automatic deduplication for SOLR 1.4 in Vufind. 
I followed:

http://wiki.apache.org/solr/Deduplication

Actually nothing happens. All records are being imported without any 
deduplication.

What am I missing?

Thanks
Markus

I did:

- create a duplicated set of records, only shifted their ID by a fixed 
number

---
solrconfig.xml

requestHandler name=/update class=solr.XmlUpdateRequestHandler 
 lst name=defaults
     str name=update.processordedupe/str
 /lst
/requestHandler

updateRequestProcessorChain name=dedupe
  processor 
class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
  bool name=enabledtrue/bool
  bool name=overwriteDupestrue/bool
  str name=signatureFielddedupeHash/str
  str name=fieldsreference,issn/str
  str 
name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str
  /processor
  processor class=solr.LogUpdateProcessorFactory /
  processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain

---
In schema.xml I added the field

field name=dedupeHash type=string stored=true indexed=true 
multiValued=false /

--

If I look at the created field dedupeHash it seems to be empty...!?


RE: Solr read-only core

2010-05-25 Thread Markus Jelsma
Hi,

 

I'd guess there are two ways in doing this but i've never seen any 
solrconfig.xml file having any directives that explicitly do not allow for 
updates.

 

You'd either have a proxy in front that simply won't allow any other HTTP 
method than GET and HEAD, or you could remove the update request handler from 
your solrconfig.xml file. I've never tried the latter but i'd figure that 
without a request handler to accommodate updates, no updates can be made.

 

Cheers,
 
-Original message-
From: Yao y...@ford.com
Sent: Tue 25-05-2010 21:49
To: solr-user@lucene.apache.org; 
Subject: Solr read-only core


Is there a way to open a Solr index/core in read-only mode? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-read-only-core-tp843049p843049.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Filtering near-duplicates using TextProfileSignature

2010-06-09 Thread Markus Jelsma
Here's my config for the updateProcessor. It not uses another signature method 
but i've used TextProfileSignature as well and it works - sort of.


  updateRequestProcessorChain name=dedupe
processor 
class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
  bool name=enabledtrue/bool
  str name=signatureFieldsig/str
  bool name=overwriteDupestrue/bool
  str name=fieldscontent/str
  str 
name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str
/processor
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain


Of course, you must define the updateProcessor in your requestHandler, it's 
commented out in mine at the moment.


  requestHandler name=/update class=solr.XmlUpdateRequestHandler
!--
   lst name=defaults
str name=update.processordedupe/str
   /lst
--
  /requestHandler


Also, i see you define minTokenLen = 3. Where does that come from? Haven't 
seen anything on the wiki specifying such a parameter.


On Tuesday 08 June 2010 19:45:35 Neeb wrote:
 Hey Andrew,
 
 Just wondering if you ever managed to run TextProfileSignature based
 deduplication. I would appreciate it if you could send me the code fragment
 for it from  solrconfig.
 
 I have currently something like this, but not sure if I am doing it right:
 
  updateRequestProcessorChain name=dedupe
 processor
 class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
   bool name=enabledtrue/bool
   str name=signatureFieldsignature/str
   bool name=overwriteDupestrue/bool
   str name=fieldstitle,author,abstract/str
   str
 name=signatureClassorg.apache.solr.update.processor.TextProfileSignature
 /str str name=minTokenLen3/str
 /processor
 processor class=solr.LogUpdateProcessorFactory /
 processor class=solr.RunUpdateProcessorFactory /
   /updateRequestProcessorChain
 
 --
 
 Thanks in advance,
 -Ali
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Filtering near-duplicates using TextProfileSignature

2010-06-09 Thread Markus Jelsma
Well, it got me too! KMail didn't properly order this thread. Can't seem to 
find Hatcher's reply anywhere. ??!!?


On Tuesday 08 June 2010 22:00:06 Andrew Clegg wrote:
 Andrew Clegg wrote:
  Re. your config, I don't see a minTokenLength in the wiki page for
  deduplication, is this a recent addition that's not documented yet?
 
 Sorry about this -- stupid question -- I should have read back through the
 thread and refreshed my memory.
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Issue with response header in SOLR running on Linux instance

2010-06-09 Thread Markus Jelsma
Hi,


Check your requestHandler. It may preset some values that you don't see. Your 
echoParams setting may be explicit instead of all [1]. Alternatively, you 
could add the echoParams parameter to your query if it isn't set as an 
invariant in your requestHandler.

[1]: http://wiki.apache.org/solr/CoreQueryParameters

Cheers,
 
On Wednesday 09 June 2010 15:25:09 bbarani wrote:
 Hi,
 
 I have been using SOLR for sometime now and had no issues till I was using
 it in windows. Yesterday I moved the SOLR code to Linux servers and started
 to index the data. Indexing completed successfully in the linux severs but
 when I queried the index, the response header returned (by the SOLR
  instance running in Linux server) is different from the response header
  returned in SOLR instance that is running on windows instance.
 
 Response header returned by SOLR instance running in windows machine
 
 - lst name=responseHeader
   int name=status0/int
   int name=QTime2219/int
 - lst name=params
   str name=indenton/str
   str name=start0/str
   str name=qcredit/str
   str name=version2.2/str
   str name=rows10/str
   /lst
   /lst
 
 
 Response header returned by SOLR instance running in Linux machine
 
 - response
 - responseHeader
   status0/status
   QTime26/QTime
 - lst name=params
   str name=qcredit/str
   /lst
   /responseHeader
 
 Any idea why this happens?
 
 Thanks,
 Barani
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



RE: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-16 Thread Markus Jelsma
Nutch does not, at this moment, support some form of consistent hashing to 
select an appropriate shard. It would be nice if someone could file an issue in 
Nutch' Jira to add sharding support to it, perhaps someone with a better 
understanding and more experience with Solr's distributed search than i have at 
the moment. I can't point Nutch' developers to the right piece of documentation 
on this one ;)
 
-Original message-
From: Otis Gospodnetic otis_gospodne...@yahoo.com
Sent: Wed 16-06-2010 21:03
To: solr-user@lucene.apache.org; 
Subject: Re: Solr and Nutch/Droids - to use or not to use?

Hi Mitch,

Solr can do distributed search, so it can definitely handle indices that can't 
fit on a single server without sharding.  What I think *might* be the case that 
the Nutch indexer that sends docs to Solr might not be capable of sending 
documents to multiple Solr cores/shards.  If that is the case, I think you need 
to move this to the Nutch user/dev list and see how to feed multiple Solr 
indices/cores/shards with Nutch data.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: MitchK mitc...@web.de
 To: solr-user@lucene.apache.org
 Sent: Wed, June 16, 2010 2:27:16 PM
 Subject: Re: Solr and Nutch/Droids - to use or not to use?
 
 
Thanks, that really helps to find the right beginning for such a journey. 
 :-)



 * Use Solr, not Nutch's search webapp 
 
As 
 far as I have read, Solr can't scale, if the index gets too large for 
 one
Server



 The setup explained here has one significant 
 caveat you also need to keep
 in mind: scale. You cannot use this kind of 
 setup with vertical scale
 (collection size) that goes beyond one Solr 
 box. The horizontal scaling
 (query throughput) is still possible with 
 the standard Solr replication
 tools.
 
...from 
 Lucidimagination.com

Is this still the case?
Furthermore, as far as I 
 have understood this blogpost: 

 href=http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/; 
 target=_blank 
 http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
Lucidimagination.com 
 : Nutch and Solr , they index the whole stuff with
nutch and reindex it to 
 Solr - sounds like a lot of redundant work.

Lucid, Sematext and the 
 Nutch-wiki are the only information-sources where I
can find talks about 
 Nutch and Solr, but no one seems to talk about these
facts - except this one 
 blogpost.

If you say this is wrong or contingent on the shown setup, can 
 you tell me
how to avoid these problems?

A lot of questions, but it's 
 such an exciting topic...

Hopefully you can answer some of 
 them.

Again, thank you for the feedback, Otis.

- Mitch
-- 
 
View this message in context: 
 href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html
Sent 
 from the Solr - User mailing list archive at Nabble.com.


RE: Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-16 Thread Markus Jelsma
You're right. Currently clients need to take care of this, in this case, Nutch 
would be the client but it cannot be configured as such. It would, indeed, be 
more appropriate for Solr to take care of this. We can already query any server 
with a set of shard hosts specified, so it would make sense if Solr also 
supported some kind of consistent hashing and shard management configuration.

 

With CouchDB-Lounge we can easily create a shard map that supports redundant 
shards on different servers for fail-over. It would be marvelous if Solr would 
support it as well.
 
-Original message-
From: Otis Gospodnetic otis_gospodne...@yahoo.com
Sent: Wed 16-06-2010 21:41
To: solr-user@lucene.apache.org; 
Subject: Re: Re: Solr and Nutch/Droids - to use or not to use?

Well, it's not that Nutch doesn't support it.  Solr itself doesn't support it.  
Indexing applications need to know which shard they want to send documents to.  
This may be a good case for a new wish issue in Solr JIRA?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Markus Jelsma markus.jel...@buyways.nl
 To: solr-user@lucene.apache.org
 Sent: Wed, June 16, 2010 3:31:49 PM
 Subject: RE: Re: Solr and Nutch/Droids - to use or not to use?
 
 Nutch does not, at this moment, support some form of consistent hashing to 
 select an appropriate shard. It would be nice if someone could file an issue 
 in 
 Nutch' Jira to add sharding support to it, perhaps someone with a better 
 understanding and more experience with Solr's distributed search than i have 
 at 
 the moment. I can't point Nutch' developers to the right piece of 
 documentation 
 on this one ;)

-Original message-
From: Otis Gospodnetic 
 
 href=mailto:otis_gospodne...@yahoo.com;otis_gospodne...@yahoo.com
Sent: 
 Wed 16-06-2010 21:03
To: 
 href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org; 
 
Subject: Re: Solr and Nutch/Droids - to use or not to use?

Hi 
 Mitch,

Solr can do distributed search, so it can definitely handle 
 indices that can't fit on a single server without sharding.  What I think 
 *might* be the case that the Nutch indexer that sends docs to Solr might not 
 be 
 capable of sending documents to multiple Solr cores/shards.  If that is the 
 case, I think you need to move this to the Nutch user/dev list and see how to 
 feed multiple Solr indices/cores/shards with Nutch 
 data.

Otis

Sematext :: 
 target=_blank http://sematext.com/ :: Solr - Lucene - Nutch
Lucene 
 ecosystem search :: http://search-lucene.com/
 



- Original Message 
 From: MitchK 
 ymailto=mailto:mitc...@web.de; 
 href=mailto:mitc...@web.de;mitc...@web.de
 To: 
 ymailto=mailto:solr-user@lucene.apache.org; 
 href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org
 
 Sent: Wed, June 16, 2010 2:27:16 PM
 Subject: Re: Solr and Nutch/Droids - 
 to use or not to use?
 
 
Thanks, that really helps to find the 
 right beginning for such a journey. 
 :-)



 * Use Solr, 
 not Nutch's search webapp 
 
As 
 far as I have read, Solr 
 can't scale, if the index gets too large for 
 
 one
Server



 The setup explained here has one significant 
 
 caveat you also need to keep
 in mind: scale. You cannot use 
 this kind of 
 setup with vertical scale
 (collection size) that 
 goes beyond one Solr 
 box. The horizontal scaling
 (query 
 throughput) is still possible with 
 the standard Solr 
 replication
 tools.
 
...from 
 
 Lucidimagination.com

Is this still the case?
Furthermore, as far as I 
 
 have understood this blogpost: 

 href=
 href=http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/; 
 target=_blank 
 http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/; target=_blank 
 
 
 href=http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
  target=_blank http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
 
Lucidimagination.com 
 : Nutch and Solr , they index the whole 
 stuff with
nutch and reindex it to 
 Solr - sounds like a lot of 
 redundant work.

Lucid, Sematext and the 
 Nutch-wiki are the only 
 information-sources where I
can find talks about 
 Nutch and Solr, but 
 no one seems to talk about these
facts - except this one 
 
 blogpost.

If you say this is wrong or contingent on the shown setup, can 
 
 you tell me
how to avoid these problems?

A lot of questions, 
 but it's 
 such an exciting topic...

Hopefully you can answer some 
 of 
 them.

Again, thank you for the feedback, Otis.

- 
 Mitch
-- 
 
View this message in context: 
 href=
 href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html;
  
 
 target=_blank 
 
 href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html
  target=_blank 
 http

RE: federated / meta search

2010-06-17 Thread Markus Jelsma
Hi,

 

Check out Solr sharding [1] capabilities. I never tested it with different 
schema's but if each node is queried with fields that it supports, it should 
return useful results.

 

[1]: http://wiki.apache.org/solr/DistributedSearch

 

Cheers.
 
-Original message-
From: Sascha Szott sz...@zib.de
Sent: Thu 17-06-2010 19:44
To: solr-user@lucene.apache.org; 
Subject: federated / meta search

Hi folks,

if I'm seeing it right Solr currently does not provide any support for 
federated / meta searching. Therefore, I'd like to know if anyone has 
already put efforts into this direction? Moreover, is federated / meta 
search considered a scenario Solr should be able to deal with at all or 
is it (far) beyond the scope of Solr?

To be more precise, I'll give you a short explanation of my 
requirements. Assume, there are a couple of Solr instances running at 
different places. The documents stored within those instances are all 
from the same domain (bibliographic records), but it can not be ensured 
that the schema definitions conform to 100%. But lets say, there are at 
least some index fields that are present in all instances (fields with 
the same name and type definition). Now, I'd like to perform a search on 
all instances at the same time (with the restriction that the query 
contains only those fields that overlap among the different schemas) and 
combine the results in a reasonable way by utilizing the score 
information associated with each hit. Please note, that due to legal 
issues it is not feasible to build a single index that integrates the 
documents of all Solr instances under consideration.

Thanks in advance,
Sascha



RE: remove from list

2010-06-23 Thread Markus Jelsma
If you want to unsubscribe, then you can do so [1] without trying to sell 
something ;)

 

[1]: http://lucene.apache.org/solr/mailing_lists.html

 

Cheers!
 
-Original message-
From: Susan Rust su...@achieveinternet.com
Sent: Wed 23-06-2010 18:23
To: solr-user@lucene.apache.org; Erik Hatcher erik.hatc...@gmail.com; 
Subject: remove from list

Hey SOLR folks -- There's too much info for me to digest, so please  
remove me from the email threads.

However, if we can build you a forum, bulletin board or other web- 
based tool, please let us know. For that matter, we would be happy to  
build you a new website.

Bill O'Connor is our CTO and the Drupal.org SOLR Redesign Lead. So we  
love SOLR! Let us know how we can support your efforts.

Susan Rust
VP of Client Services

If you wish to travel quickly, go alone
If you wish to travel far, go together

Achieve Internet
1767 Grand Avenue, Suite 2
San Diego, CA 92109

800-618-8777 x106
858-453-5760 x106

Susan-Rust (skype)
@Susan_Rust (twitter)
@Achieveinternet (twitter)
@drupalsandiego (San Diego Drupal Users' Group Twitter)



This message contains confidential information and is intended only  
for the individual named. If you are not the named addressee you  
should not disseminate, distribute or copy this e-mail. Please notify  
the sender immediately by e-mail if you have received this e-mail by  
mistake and delete this e-mail from your system. E-mail transmission  
cannot be guaranteed to be secure or error-free as information could  
be intercepted, corrupted, lost, destroyed, arrive late or incomplete,  
or contain viruses. The sender therefore does not accept liability for  
any errors or omissions in the contents of this message, which arise  
as a result of e-mail transmission. If verification is required please  
request a hard-copy version.













On Jun 23, 2010, at 1:52 AM, Mark Allan wrote:

 Cheers, Geert-Jan, that's very helpful.

 We won't always be searching with dates and we wouldn't want  
 duplicates to show up in the results, so your second suggestion  
 looks like a good workaround if I can't solve the actual problem.  I  
 didn't know about FieldCollapsing, so I'll definitely keep it in mind.

 Thanks
 Mark

 On 22 Jun 2010, at 3:44 pm, Geert-Jan Brits wrote:

 Perhaps my answer is useless, bc I don't have an answer to your  
 direct
 question, but:
 You *might* want to consider if your concept of a solr-document is  
 on the
 correct granular level, i.e:

 your problem posted could be tackled (afaik) by defining a   
 document being a
 'sub-event' with only 1 daterange.
 So for each event-doc you have now, this is replaced by several sub- 
 event
 docs in this proposed situation.

 Additionally each sub-event doc gets an additional field 'parent- 
 eventid'
 which maps to something like an event-id (which you're probably  
 using) .
 So several sub-event docs can point to the same event-id.

 Lastly, all sub-event docs belonging to a particular event  
 implement all the
 other fields that you may have stored in that particular event-doc.

 Now you can query for events based on data-rages like you  
 envisioned, but
 instead of returning events you return sub-event-docs. However  
 since all
 data of the original event (except the multiple dateranges) is  
 available in
 the subevent-doc this shouldn't really bother the client. If you  
 need to
 display all dates of an event (the only info missing from the  
 returned
 solr-doc) you could easily store it in a RDB and fetch it using the  
 defined
 parent-eventid.

 The only caveat I see, is that possibly multiple sub-events with  
 the same
 'parent-eventid' might get returned for a particular query.
 This however depends on the type of queries you envision. i.e:
 1)  If you always issue queries with date-filters, and *assuming*  
 that
 sub-events of a particular event don't temporally overlap, you will  
 never
 get multiple sub-events returned.
 2)  if 1)  doesn't hold and assuming you *do* mind multiple sub- 
 events of
 the same actual event, you could try to use Field Collapsing on
 'parent-eventid' to only return the first sub-event per parent- 
 eventid that
 matches the rest of your query. (Note however, that Field  
 Collapsing is a
 patch at the moment. http://wiki.apache.org/solr/FieldCollapsing)

 Not sure if this helped you at all, but at the very least it was a  
 nice
 conceptual exercise ;-)

 Cheers,
 Geert-Jan


 2010/6/22 Mark Allan mark.al...@ed.ac.uk

 Hi all,

 Firstly, I apologise for the length of this email but I need to  
 describe
 properly what I'm doing before I get to the problem!

 I'm working on a project just now which requires the ability to  
 store and
 search on temporal coverage data - ie. a field which specifies a  
 date range
 during which a certain event took place.

 I hunted around for a few days and couldn't find anything which  
 seemed to
 fit, so I had a go at writing my 

Re: Cache hits exposed by API

2010-06-29 Thread Markus Jelsma
Hi,


De AdminRequestHandler exposes a JSP [1] that'll return a nice XML document 
with all the information you need about cache statistics and other.

[1]: http://localhost:8983/solr/admin/stats.jsp

Cheers,

On Tuesday 29 June 2010 15:52:56 Na_D wrote:
 This is just an enquiry.I just wanted to know if the cache hit rates of
  solr exposed via the API of solr?
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



RE: Disabling Access to Solr Admin Panel

2010-06-29 Thread Markus Jelsma
Hi,

 

Check out the wiki [1] on this subject.

 

[1]: http://wiki.apache.org/solr/SolrSecurity

 

Cheers,
 
-Original message-
From: Vladimir Sutskever vladimir.sutske...@jpmorgan.com
Sent: Tue 29-06-2010 18:05
To: solr-user@lucene.apache.org; 
Subject: Disabling Access to Solr Admin Panel

Hi All,

How can I forbid access to the SOLR index admin panel?

Can I configure this in the /jetty.xml -

I understand that's it's not true security - considering 
updates/delete/re-indexing commands will still be allowed - via GET request.


Kind regards,

Vladimir Sutskever
Investment Bank - Technology
JPMorgan Chase, Inc.



This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.   

RE: Re: Faceted search outofmemory

2010-06-29 Thread Markus Jelsma
http://wiki.apache.org/solr/SimpleFacetParameters#facet.limit 
 
-Original message-
From: olivier sallou olivier.sal...@gmail.com
Sent: Tue 29-06-2010 20:11
To: solr-user@lucene.apache.org; 
Subject: Re: Faceted search outofmemory

How do make paging over facets?

2010/6/29 Ankit Bhatnagar abhatna...@vantage.com


 Did you trying paging them?


 -Original Message-
 From: olivier sallou [mailto:olivier.sal...@gmail.com]
 Sent: Tuesday, June 29, 2010 2:04 PM
 To: solr-user@lucene.apache.org
 Subject: Faceted search outofmemory

 Hi,
 I try to make a faceted search on a very large index (around 200GB with
 200M
 doc).
 I have an out of memory error. With no facet it works fine.

 There are quite many questions around this but I could not find the answer.
 How can we know the required memory when facets are used so that I try to
 scale my server/index correctly to handle it.

 Thanks

 Olivier



RE: Re: Disable Solr Response Formatting

2010-06-30 Thread Markus Jelsma
Hi,

 

My client makes a mess out of your example but if you mean formatting as in 
indenting, then send indent=false, but it's already false by default. Check 
your requestHandler settings.

 

Cheers,
 
-Original message-
From: JohnRodey timothydd...@yahoo.com
Sent: Wed 30-06-2010 18:39
To: solr-user@lucene.apache.org; 
Subject: Re: Disable Solr Response Formatting


Oops, let me try that again...

By default my SOLR response comes back formatted, like such 



 
   C/
 


   


Is there a way to tell it to return it unformatted? like: 

C/ 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-Solr-Response-Formatting-tp933785p933793.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr results not updating

2010-07-06 Thread Markus Jelsma
Hi,

 

If q=*:* doesn't show your insert, then you forgot the commit:

http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22

 

Cheers,


 
-Original message-
From: Moazzam Khan moazz...@gmail.com
Sent: Tue 06-07-2010 22:09
To: solr-user@lucene.apache.org; 
Subject: Solr results not updating

Hi,

I just successfully inserted a document into SOlr but when I search
for it, it doesn't show up. Is it a cache issue or something? Is there
a way to make sure it was inserted properly? And, it's there?

Thanks,
Moazzam


RE: /select handler statistics

2010-07-12 Thread Markus Jelsma
Hi,

 

I think you're looking for the statistics for the standard request handler.

 

Cheers,
 
-Original message-
From: Vladimir Sutskever vladimir.sutske...@jpmorgan.com
Sent: Mon 12-07-2010 19:44
To: solr-user@lucene.apache.org; 
Subject: /select handler statistics

Hi All,

I am looking at the stats.jsp page in the SOLR admin panel.

I do not see statistics for the /select request handler.

I want to know total # of search requests  + avg time of request ... etc

Am I overlooking something?



Kind regards,

Vladimir Sutskever
Investment Bank - Technology
JPMorgan Chase, Inc.



This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.   

RE: Problem with Wildcard searches in Solr

2010-07-12 Thread Markus Jelsma
Hi,

 

The DisMaxQParser does not support wildcards in its q parameter [1]. You must 
use the LuceneQParser instead. AFAIK, in DisMax, wildcards are part of the 
search query and may get filtered out in your query analyzer.

 

[1]: http://wiki.apache.org/solr/DisMaxRequestHandler#q

 

Cheers,
 
-Original message-
From: imranak imranak...@gmail.com
Sent: Mon 12-07-2010 22:40
To: solr-user@lucene.apache.org; 
Subject: Problem with Wildcard searches in Solr



Hi,

I am having a problem doing wildcard searches in lucene syntax using the
edismax handler. I have Solr 4.0 nightly build from the trunk.

A general search like 'computer' returns results but 'com*er' doesn't return
any results. Similary, a search like 'co?mput?r' returns no results. The
only type of wildcard searches working currrently is ones with trailing
wildcards(like compute? or comput*).

I want to be able to do searches with wildcards at the beginning (*puter)
and in between (com*er). Could someone please tell me what I am doing wrong
and how to fix it.

Thanks.

Regards,
Imran.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961448.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Problem with Wildcard searches in Solr

2010-07-12 Thread Markus Jelsma
Hi,

 

Check edismax' JIRA page and its unresolved related issues [1]. AFAIK, it 
hasn't been committed yet.

 

[1]: https://issues.apache.org/jira/browse/SOLR-1553

 

Cheers,
 
-Original message-
From: imranak imranak...@gmail.com
Sent: Mon 12-07-2010 23:55
To: solr-user@lucene.apache.org; 
Subject: RE: Problem with Wildcard searches in Solr


Hi,

Thanks for you response. The dismax query parser doesn't support it but I
heard the edismax parser supports all kinds of wildcards. Been trying it out
but without any luck. Could someone please help me with that. I'm unable to
make leading and in-the-middle wildcard searches work.

Thanks.

Imran.


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961617.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Ranking position in solr

2010-07-13 Thread Markus Jelsma
No, it can build for each new searcher [1].

[1]: http://wiki.apache.org/solr/QueryElevationComponent#config-file


On Tuesday 13 July 2010 11:02:10 Chamnap Chhorn wrote:
 The problem is that every time I update the elevate.xml, I need to restart
 solr tomcat service. This feature needs to be updated frequently. How would
 i handle that?
 
 Any idea or other solutions?
 
 On Mon, Jul 12, 2010 at 5:45 PM, Ahmet Arslan iori...@yahoo.com wrote:
   I wonder there is a proper way to
   fulfill this requirement. A book has
   several keyphrases. Each keyphrase consists from one word
   to 3 words. The
   author could either buy keyphrase position or don't buy
   position. Note: each
   author could buy more than 1 keyphrase. The keyphrase
   search must be exact
   and case sensitive.
  
   For example: Book A, keyphrases: agile, web, development
   Book B, keyphrases:
   css, html, web
  
   Let's say Author of Book A buys search result position 1
   with keyphrase
   web, so his book should be in the first position. His
   book should be
   listed before the Book B.
  
   Anyone has any suggestions on how to implement this in
   solr?
 
  http://wiki.apache.org/solr/QueryElevationComponent - which is used to
  elevate results based on editorial decisions - may help.
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: indexing rich documents

2010-07-13 Thread Markus Jelsma
Hi,

Are you sure you followed the wiki [1] on this subject? There is an example 
there but you need Solr 1.4.0 or higher. I unsure if just patching 1.3.0 will 
really do the trick. The patch must then also include Apache Tika, which sits 
under the hood, extracting content and meta data from various formats.

[1]: http://wiki.apache.org/solr/ExtractingRequestHandler

Cheers,

On Tuesday 13 July 2010 14:11:56 satya swaroop wrote:
 Hi all,
  i am new to solr and followed with the wiki and got the solr admin
 run sucessfully. It is good going for xml files. But to index the rich
 documents i am unable to get it. I followed wiki to make the richer
 documents also,  but i didnt get it.The error comes when i send an pdf/html
 file is a lazy error. can anyone give some detail description about how to
 make richer documents indexable
  i use tomcat and working in ubuntu. The home directory for solr is
 /opt/solr/example and catalina home is /opt/tomcat6.
 
 
 thanks  regards,
  swaroop
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Tag generation

2010-07-15 Thread Markus Jelsma
Check out OpenCalais [1]. Maybe it works for your case and language.

[1]: http://www.opencalais.com/

On Thursday 15 July 2010 17:34:31 kenf_nc wrote:
 A colleague mentioned that he knew of services where you pass some content
 and it spits out some suggested Tags or Keywords that would be best suited
 to associate with that content.
 
 Does anyone know if there is a contrib to Solr or Lucene that does
  something like this? Or a third party tool that can be given a solr index
  or solr query and it comes up with some good Tag suggestions?
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Wiki, login and password recovery

2010-07-19 Thread Markus Jelsma
Hi,

 

This probably should be in INFRA (to which i'm not subscribed) or something 
like that. Anyway, for some reason, my user/pass won't let me login anymore and 
i'm quite sure my browser still `remembers` the correct combination. I'm unsure 
whether this is a bug: to get that answer, i need to recover my current 
password so i can check... But, how convenient, the password recovery mechanism 
`cannot connect with the mailserver on localhost ERRNO: 60` and times out.

 

Any assistance on this one?

 

Cheers,

 


RE: Re: Wiki, login and password recovery

2010-07-19 Thread Markus Jelsma
This happened just a few hours ago and the problem persists at this very 
moment. I filed an issue: https://issues.apache.org/jira/browse/INFRA-2884

 

Cheers!


 
-Original message-
From: Chris Hostetter hossman_luc...@fucit.org
Sent: Mon 19-07-2010 20:23
To: solr-user@lucene.apache.org; 
Subject: Re: Wiki, login and password recovery


You don't need to subscribe to any infra lists to file an INFRA bug, just 
use Jira...

https://issues.apache.org/jira/browse/INFRA

Note that there was infra work this weekend that involved moving servers 
for the wiki system (as was noted in advance on 
http://monitoring.apache.org and http://twitter.com/infrabot) so maybe you 
just got unluky with the timing?

 https://blogs.apache.org/infra/entry/new_hardware_for_apache_org 
 
: This probably should be in INFRA (to which i'm not subscribed) or 
: something like that. Anyway, for some reason, my user/pass won't let me 
: login anymore and i'm quite sure my browser still `remembers` the 
: correct combination. I'm unsure whether this is a bug: to get that 
: answer, i need to recover my current password so i can check... But, how 
: convenient, the password recovery mechanism `cannot connect with the 
: mailserver on localhost ERRNO: 60` and times out.



-Hoss


 


RE: boosting particular field values

2010-07-21 Thread Markus Jelsma
function queries match all documents


http://wiki.apache.org/solr/FunctionQuery#Using_FunctionQuery

 
-Original message-
From: Justin Lolofie jta...@gmail.com
Sent: Wed 21-07-2010 20:24
To: solr-user@lucene.apache.org; 
Subject: boosting particular field values

I'm using dismax request handler, solr 1.4.

I would like to boost the weight of certain fields according to their
values... this appears to work:

bq=category:electronics^5.5

However, I think this boosting only affects sorting the results that
have already matched? So if I only get 10 rows back, I might not get
any records back that are category electronics. If I get 100 rows, I
can see that bq is working. However, I only want to get 10 rows.

How does one affect the kinds of results that are matched to begin
with? bq is the wrong thing to use, right?

Thanks for any help,
Justin


Re: SolrJ Response + JSON

2010-07-28 Thread Markus Jelsma
Hi,

I got a response to your e-mail in my box 30 minutes ago. Anyway, enable the 
JSONResponseWriter, if you haven't already, and query with wt=json. Can't get 
mucht easier.

Cheers,

On Wednesday 28 July 2010 15:08:26 MitchK wrote:
 Hello ,
 
 Second try to send a mail to the mailing list...
 
 I need to translate SolrJ's response into JSON-response.
 I can not query Solr directly, because I need to do some math with the
 responsed data, before I show the results to the client.
 
 Any experiences how to translate SolrJ's response into JSON without writing
 your own JSON Writer?
 
 Thank you.
 - Mitch
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



RE: Stress Test Solr

2010-08-02 Thread Markus Jelsma
Very interersting. Could you add some information and link to the relevant wiki 
page [1] ?

 

[1]: http://wiki.apache.org/solr/BenchmarkingSolr
 
-Original message-
From: Tomas tomasflo...@yahoo.com.ar
Sent: Mon 02-08-2010 17:34
To: solr-user@lucene.apache.org; 
Subject: Stress Test Solr

Hi All, we've been building an open source tool for load tests on Solr 
Installations. Thetool is called SolrMeter. It's on google code 
at http://code.google.com/p/solrmeter/. Here is some information about it:

SolrMeter is an stress testing / performance benchmarking tool for Apache Solr 
installations.  It is licensed under ASL and developed using JavaSE and Swing 
components, connected with Solr using SolrJ.

What can youdowith SolrMeter?
The main goal of this open source project is bring to the Apache Solr user 
community a tool for dealing with Solr specific issues regarding performance 
and 
stress testing like firing queries and adding documents to make sure that your 
Solr instalation will support real world's load and demands. With SolrMeter you 
can simulate a work load over the Apache Solr instalation and to obtain useful 
visual performance statistics and metrics.
Relevant Features:
* Execute queries against a Solr installation
* Execute dummy updates/inserts to the Solr installation, it can be the same 
server as the queries or a different one.
* Configure number of queries to fire in a time period interval
* Configure the number of updates/inserts in a time period.
* Configure commits frequency during adds
* Monitor error counts when adding and commiting documents.
* Perform and monitor index optimization
* Monitor query times online and visually
* Add filter queries into the test queries
* Add facet abilities into the test queries
* Import/Export test configuration
* Query time execution histogram chart
* Query times distribution chart
* Online error log and browsing capabilities
* Individual query graphical log and statistics
* and much more

Whatdo you need for use SolrMeter?
This is one of the most interesting points about SolrMeter, the requirements 
are 
minimal. It is simple to install and use.
* JRE versión 1.6
* The Solr Server you want to test.

Who can use SolrMeter?
Everyone who needs to assess the solrmeter server performance. To run the tool 
you only need to know about SOLR.



Try it and tell us what you think . . . . .  

   Solrmeter Group
   mailto:solrme...@googlegroups.com

What's next?
We are now building version 0.2.0, the objetive of this new version is to 
evolve 
SolrMeter into a pluggable architecture to allow deeper customizations like 
adding custom statistics, extractors or executors.
We are also adding some usability improvements.

On future versions we want to add a better interaction with Solr request 
handlers, for example, showing cache statistics online and graphically on some 
chart would be a great tool.
We also want to add more usability features to make of solrmeter a complete 
tool 
for testing a Solr instalation.
For more details on what's next che the Issues page on the google code site.



      

RE: Phrase search

2010-08-02 Thread Markus Jelsma
Well, the WordDelimiterFilterFactory in your query analyzer clearly makes 
Apple 2 out of Apple2, that's what it's for. If you're looking for an exact 
match, use a string field. Check the output with the debugQuery=true parameter.

 

Cheers, 
 
-Original message-
From: johnmu...@aol.com
Sent: Mon 02-08-2010 20:18
To: solr-user@lucene.apache.org; 
Subject: Phrase search


Hi All,

I don't understand why i'm getting this behavior.  I was under the impression 
if I search for Apple 2 (with quotes and space before 2 ) it will give me 
different results vs. if I search for Apple2 (with quotes and no space before 
2 ), but I'm not!  Why? 

Here is my fieldType setting from my schema.xml:

   fieldType name=text class=solr.TextField positionIncrementGap=100
     analyzer type=index
       tokenizer class=solr.WhitespaceTokenizerFactory/
       !-- in this example, we will only use synonyms at query time
       filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/
       --
       filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
       filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/
       filter class=solr.LowerCaseFilterFactory/
       filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
       filter class=solr.RemoveDuplicatesTokenFilterFactory/
     /analyzer
     analyzer type=query
       tokenizer class=solr.WhitespaceTokenizerFactory/
       !-- filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/ --
       filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
       filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/
       filter class=solr.LowerCaseFilterFactory/
       filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
       filter class=solr.RemoveDuplicatesTokenFilterFactory/
     /analyzer
   /fieldType

What I am missing?!!  What part of my solr.WordDelimiterFilterFactory need to 
change (if that s where the issue is)?

I m using Solr 1.2

Thanks in advanced.

-M



RE: Re: Phrase search

2010-08-02 Thread Markus Jelsma
Hi,

 

Queries on an analyzed field will need to be analyzed as well or it might not 
match. You can configure the WordDelimiterFilterFactory so it will not split 
into multiple tokens because of numerics, see the splitOnNumerics parameter [1].

 

[1]: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

 

Cheers,


 
-Original message-
From: johnmu...@aol.com
Sent: Mon 02-08-2010 21:29
To: solr-user@lucene.apache.org; 
Subject: Re: Phrase search





Thanks for the quick response.

Which part of my WordDelimiterFilterFactory is changing Apple 2 to Apple2?  
How do I fix it?  Also, I'm really confused about this.  I was under the 
impression a phrase search is not impacted by the analyzer, no?

-M


-Original Message-
From: Markus Jelsma markus.jel...@buyways.nl
To: solr-user@lucene.apache.org
Sent: Mon, Aug 2, 2010 2:27 pm
Subject: RE: Phrase search


Well, the WordDelimiterFilterFactory in your query analyzer clearly makes 
Apple 
 out of Apple2, that's what it's for. If you're looking for an exact match, 
se a string field. Check the output with the debugQuery=true parameter.

Cheers, 

Original message-
rom: johnmu...@aol.com
ent: Mon 02-08-2010 20:18
o: solr-user@lucene.apache.org; 
ubject: Phrase search

i All,
I don't understand why i'm getting this behavior.  I was under the impression 
if 
search for Apple 2 (with quotes and space before 2 ) it will give me 
ifferent results vs. if I search for Apple2 (with quotes and no space before 
), but I'm not!  Why? 
Here is my fieldType setting from my schema.xml:
  fieldType name=text class=solr.TextField positionIncrementGap=100
   analyzer type=index
     tokenizer class=solr.WhitespaceTokenizerFactory/
     !-- in this example, we will only use synonyms at query time
     filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
gnoreCase=true expand=false/
     --
     filter class=solr.StopFilterFactory ignoreCase=true 
ords=stopwords.txt/
     filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
enerateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/
     filter class=solr.LowerCaseFilterFactory/
     filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/
     filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
     tokenizer class=solr.WhitespaceTokenizerFactory/
     !-- filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
gnoreCase=true expand=true/ --
     filter class=solr.StopFilterFactory ignoreCase=true 
ords=stopwords.txt/
     filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
enerateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/
     filter class=solr.LowerCaseFilterFactory/
     filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/
     filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType
What I am missing?!!  What part of my solr.WordDelimiterFilterFactory need to 
hange (if that s where the issue is)?
I m using Solr 1.2
Thanks in advanced.
-M



RE: Multi word synomyms

2010-08-03 Thread Markus Jelsma
Hi,

 

This happens because your tokenizer will generate seperate tokens for `exercise 
dvds`, so the SynonymFilter will try to find declared synonyms for `exercise` 
and `dvds` separately. It's behavior is documented [1] on the wiki.

 

[1]: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

 

Cheers,
 
-Original message-
From: Qwerky neil.j.tay...@hmv.co.uk
Sent: Tue 03-08-2010 18:35
To: solr-user@lucene.apache.org; 
Subject: Multi word synomyms


I'm having trouble getting multi word synonyms to work. As an example I have
the following synonym;

exercise dvds = fitness

When I search for exercise dvds I want to return all docs in the index which
contain the keyword fitness. I've read the wiki about
solr.SynonymFilterFactory which recommends expanding the synonym when
indexing, but I'm not sure this is what I want as none of my documents have
the keywords exercise dvds.

Here is the field definition from my schema.xml;





















When I test my search with the analysis page on the admin console it seems
to work fine;

Query Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory   {}


term position
12


term text
exercisedvds

term type
wordword

source start,end
0,89,13


payload


org.apache.solr.analysis.SynonymFilterFactory   {ignoreCase=true,
synonyms=synonyms.txt, expand=true}


term position
1

term text
fitness


term type
word

source start,end
0,13

payload


org.apache.solr.analysis.TrimFilterFactory   {}



term position
1

term text
fitness

term type
word


source start,end
0,13

payload


org.apache.solr.analysis.StopFilterFactory   {ignoreCase=true,
enablePositionIncrements=true, words=stopwords.txt}


term position
1


term text
fitness

term type
word

source start,end
0,13

payload



org.apache.solr.analysis.LowerCaseFilterFactory   {}


term position
1

term text
fitness

term type

word

source start,end
0,13

payload


org.apache.solr.analysis.SnowballPorterFilterFactory   {language=English,
protected=protwords.txt}


term position

1

term text
fit

term type
word

source start,end
0,13


payload



...but when I perform the search it doesn't seem to use the
SynonymFilterFactory;



0
0

 exercise dvds
 0

 on
 
 standard
 
 
 2.2
 standard

 on
 *,score
 10

.

exercise dvds
exercise dvds
PRODUCTKEYWORDS:exercis PRODUCTKEYWORDS:dvds

PRODUCTKEYWORDS:exercis PRODUCTKEYWORDS:dvds


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synomyms-tp1019722p1019722.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Markus Jelsma
You shouldn't fetch faceting results from analyzed fields, it will mess with 
your results. Search on analyzed fields but don't retrieve values from them. 
 
-Original message-
From: PeterKerk vettepa...@hotmail.com
Sent: Wed 04-08-2010 22:15
To: solr-user@lucene.apache.org; 
Subject: RE: Indexing fieldvalues with dashes and spaces


I changed  values to text_ws 

Now I only seem to have problems with fieldvalues that hold spacessee
below:

  field name=city type=text_ws indexed=true stored=true/
  field name=theme type=text_ws indexed=true stored=true
multiValued=true omitNorms=true termVectors=true /
  field name=features type=text_ws indexed=true stored=true
multiValued=true/
  field name=services type=text_ws indexed=true stored=true
multiValued=true/
  field name=province type=text_ws indexed=true stored=true/

It has now become:

facet_counts:{
 facet_queries:{},
 facet_fields:{
theme:[
Gemeentehuis,2,
,1,    still  is created as separate facet
Strand,1,
Zee,1],
features:[
Cafe,3,
Danszaal,2,
Tuin,2,
Strand,1],
province:[
Gelderland,1,
Utrecht,1,
Zuid-Holland,1],  this is now correct
services:[
Exclusieve,2,
Fotoreportage,2,
huur,2,
Live,1,  Live muziek is split and separate facets are created
muziek,1]},
 facet_dates:{}}}
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Markus Jelsma
Hmm, you should first read a bit more on schema design on the wiki and learn 
about indexing and querying Solr.

 

The copyField directive is what is commonly used in a faceted navigation 
system, search on analyzed fields, show faceting results using the primitive 
string field type. With copyField, you can, well, copy the field from one to 
another without it being analyzed by the first - so no chaining is possible, 
which is good. 

 

Let's say you have a city field you want to navigate with, but also search in, 
then you would have an analyzed field for search and a string field for 
displaying the navigation.

 

But, check the wiki on this subject.
 
-Original message-
From: PeterKerk vettepa...@hotmail.com
Sent: Wed 04-08-2010 22:23
To: solr-user@lucene.apache.org; 
Subject: RE: Indexing fieldvalues with dashes and spaces


Sorry, but Im a newbie to Solr...how would I change my schema.xml to match
your requirements?

And what do you mean by it will mess with your results? What will happen
then?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023824.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Re: Load cores without restarting/reloading Solr

2010-08-05 Thread Markus Jelsma
http://wiki.apache.org/solr/CoreAdmin
 
-Original message-
From: Karthik K karthikkato...@gmail.com
Sent: Thu 05-08-2010 12:00
To: solr-user@lucene.apache.org; 
Subject: Re: Load cores without restarting/reloading Solr

Can some one please answer this.

Is there a way of creating/adding a core and starting it without having to
reload Solr ?


RE: dismax debugging hyphens dashes

2010-08-07 Thread Markus Jelsma
Well, that smells like a WordDelimiterFilterFactory [1]. It splits, as your 
debug output shows, value into three separate tokens. This means that (at 
least)  the strings 'abc', '12' and 'def' are in your index and can be found. 
The abc12 value is not present. If you want to query for substrings, you can 
try NGramFilterFactory [2]. It's not really documented on the wiki but 
searching will help [3].

 

[1]: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

[2]: http://search.lucidimagination.com/search/document/CDRG_ch05_5.5.6
[3]: 
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
-Original message-
From: j jta...@gmail.com
Sent: Sat 07-08-2010 19:18
To: solr-user@lucene.apache.org; 
Subject: dismax debugging hyphens dashes

How does one debug index vs. dismax query parser?

I have a solr instance with 1 document whose title is ABC12-def. I
am using dismax. While abc, 12, and def do match, abc12 and
def do not. Here is a the parsedquery_toString, I'm having trouble
understanding it:

+(id:abc12^3.0 | title:(abc12 abc) 12^1.5) (id:abc12^3.0 |
title:(abc12 abc) 12^1.5)

Does anyone have advice for getting this to work?

 


RE: Re: Facet Fields - ID vs. Display Value

2010-08-09 Thread Markus Jelsma
Well, you can do both, of cource but there's no need for additional code if you 
get it for free. I'd prefer - as most i assume - to use the label as a facet 
field.
 
-Original message-
From: Frank A fsa...@gmail.com
Sent: Tue 10-08-2010 01:11
To: solr-user@lucene.apache.org; 
Subject: Re: Facet Fields - ID vs. Display Value

What I meant (which I realize now wasn't very clear) was if I have
something like categoryID and categorylabel - is the normal practice
to define categoryID as the facet field and then have the UI layer
display the label?  Or would it be normal to directly use
categorylabel as the facet field?



On Mon, Aug 9, 2010 at 6:01 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hi Frank,

 I'm not sure what you mean by that.
 If the question is about what should be shown in the UI, it should be 
 something
 pretty and human-readable, such as the original facet string value, assuming 
 it
 was nice and clean.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
 From: Frank A fsa...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Mon, August 9, 2010 5:19:57 PM
 Subject: Facet Fields - ID vs. Display Value

 Is there a general best practice on whether facet fields should be on
 IDs  or Display values?

 -Frank




RE: Re: uniqueKey and custom fieldType

2010-08-15 Thread Markus Jelsma
copyField it to an analyzed field will do the trick. 
 
-Original message-
From: j jta...@gmail.com
Sent: Sun 15-08-2010 20:30
To: solr-user@lucene.apache.org; 
Subject: Re: uniqueKey and custom fieldType

Hi Erick, thanks- your explanation makes sense. But how then, do I
make my unique field useful in terms of searching. If I have a unique
column id with value:

sometexthere-1234567

and want it match the query '1234567', I need to use an analyzer to
split up the parts around the hyphen/dash. I guess I could make a copy
of that field in another field with gets analyzed?

Thanks for any advice.



The short answer is that unique keys should be s single
term. String types are guaranteed to be single, since they
aren't analyzed. Your SplitUpStuff type *does* analyze
terms, and can make multiple tokens out of single strings
via WordDelimterFactory.

A common error when thinking about the string the type is
not understanding that it is NOT analyzed. It's indexed as
a single term. So whey you define UniqueKey of type string,
it behaves as you expect. That is documents are updated if
the ID field matches exactly, case, spaces, order and all.

By introducing your SplitUpStuff type as UniqueKey, Well,
I don't even know what behavior I'd expect. And whatever
behavior I happened to observe would not be guaranteed to
be the behavior of the next release.

Consider what you're asking for and you can see why you
don't want to analyze your uniquekey field. Consider
the following simple text type (where each word is a term).
You have two values from two different docs
doc1: this is a nice unique key
doc2: My Keys are Unique and Nice

It's quite possible, with combinations of analyzers and stemmers
to index the exact same tokens, namely nice, unique and key
for each document. Are these equivalent? Does order count?
Capitalization? It'd just be a nightmare to try to
explain/predict/implement.

Likely whatever behavior you do get is just whatever falls out of the
code. I'm not even sure any attempt is made to enforce uniqueness
on an analyzed field.

HTH
Erick

On Sun, Aug 15, 2010 at 11:59 AM, j jta...@gmail.com wrote:

 I guess another way to pose the question is- what could cause
 uniqueKeyid/uniqueKey   to no longer be respected?


 The last chance I made since I noticed the problem of non-unique docs
 was by changing field title from string to SplitUpStuff. But I
 dont understand how that could affect the uniqueness of a different
 field called id.

 fieldType name=splitUpStuff class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
         filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=0 c
         filter class=solr.StopFilterFactory
                ignoreCase=true
                words=stopwords.txt
                 enablePositionIncrements=false
                /
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
         filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
 /fieldType






 In order to make even a guess, we'd have to see your new
 field type. Particularly its field definitions and the analysis
 chain...

 Best
 Erick

 On Fri, Aug 13, 2010 at 5:16 PM, j jta...@gmail.com wrote:

  Does fieldType have any effect on the thing that I specify should be
  unique?
 
  uniqueKey has been working for me up until recently. I change the
  field that is unique from type string to a fieldType that I have
  defined. Now when I do an update I get a newly created document (so
  that I have duplicates).
 
  Has anyone else had this problem before?
 



RE: Newbie question about search behavior

2010-08-16 Thread Markus Jelsma
You can append it in your middleware, or try the EdgeNGramTokenizer [1]. If 
you're going for the latter, don't forget to reindex and expect a larger index.

 

[1]: 
http://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.html
-Original message-
From: Mike Thomsen mikerthom...@gmail.com
Sent: Mon 16-08-2010 19:09
To: solr-user@lucene.apache.org; 
Subject: Newbie question about search behavior

Is it possible to set up Lucene to treat a keyword search such as

title:News

implicitly like

title:News*

so that any title that begins with News will be returned without the
user having to throw in a wildcard?

Also, are there any common filters and such that are generally
considered a good practice to throw into the schema for an
English-language website?

Thanks,

Mike

 


RE: help on facet range

2010-08-16 Thread Markus Jelsma
No

 

http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range

https://issues.apache.org/jira/browse/SOLR-1240
 
-Original message-
From: Peng, Wei wei.p...@xerox.com
Sent: Mon 16-08-2010 20:25
To: solr-user@lucene.apache.org; 
Subject: RE: help on facet range

The solr version that I am using is 1.4.0.
Does it support facet.range?

Wei

-Original Message-
From: Peng, Wei [mailto:wei.p...@xerox.com] 
Sent: Monday, August 16, 2010 2:12 PM
To: solr-user@lucene.apache.org
Subject: help on facet range

I have been trying to use facet by range.

However no matter how I tried, I did not get anything from facet range (
I do get results from facet fields: topic and author).

The query is
http://localhost:8983/solr/select/?facet.range=timestampfacet.range.sta
rt=0facet.range.end=1277942270facet.range.gap=86400facet.range.other=
allindent=onq=*:*facet.field=topicfacet.field=author



The facet range field is timestamp, which is defined to be int in the
Schema.

Can someone help me on this problem?



Many Thanks

Wei





RE: Re: Solr searching performance issues, using large documents

2010-08-16 Thread Markus Jelsma
I've no idea if it's possible but i'd at least try to return an ArrayList of 
rows instead of just a single row. And if it doesn't work, which is probably 
the case, how about filing an issue in Jira?

 

Reading the docs in the matter, i think it should (made) to be possible to 
return multiple rows in an ArrayList.
 
-Original message-
From: Peter Spam ps...@mac.com
Sent: Tue 17-08-2010 00:47
To: solr-user@lucene.apache.org; 
Subject: Re: Solr searching performance issues, using large documents

Still stuck on this - any hints on how to write the JavaScript to split a 
document?  Thanks!


-Pete

On Aug 5, 2010, at 8:10 PM, Lance Norskog wrote:

 You may have to write your own javascript to read in the giant field
 and split it up.
 
 On Thu, Aug 5, 2010 at 5:27 PM, Peter Spam ps...@mac.com wrote:
 I've read through the DataImportHandler page a few times, and still can't 
 figure out how to separate a large document into smaller documents.  Any 
 hints? :-)  Thanks!
 
 -Peter
 
 On Aug 2, 2010, at 9:01 PM, Lance Norskog wrote:
 
 Spanning won't work- you would have to make overlapping mini-documents
 if you want to support this.
 
 I don't know how big the chunks should be- you'll have to experiment.
 
 Lance
 
 On Mon, Aug 2, 2010 at 10:01 AM, Peter Spam ps...@mac.com wrote:
 What would happen if the search query phrase spanned separate document 
 chunks?
 
 Also, what would the optimal size of chunks be?
 
 Thanks!
 
 
 -Peter
 
 On Aug 1, 2010, at 7:21 PM, Lance Norskog wrote:
 
 Not that I know of.
 
 The DataImportHandler has the ability to create multiple documents
 from one input stream. It is possible to create a DIH file that reads
 large log files and splits each one into N documents, with the file
 name as a common field. The DIH wiki page tells you in general how to
 make a DIH file.
 
 http://wiki.apache.org/solr/DataImportHandler
 
 From this, you should be able to make a DIH file that puts log files
 in as separate documents. As to splitting files up into
 mini-documents, you might have to write a bit of Javascript to achieve
 this. There is no data structure or software that implements
 structured documents.
 
 On Sun, Aug 1, 2010 at 2:06 PM, Peter Spam ps...@mac.com wrote:
 Thanks for the pointer, Lance!  Is there an example of this somewhere?
 
 
 -Peter
 
 On Jul 31, 2010, at 3:13 PM, Lance Norskog wrote:
 
 Ah! You're not just highlighting, you're snippetizing. This makes it 
 easier.
 
 Highlighting does not stream- it pulls the entire stored contents into
 one string and then pulls out the snippet.  If you want this to be
 fast, you have to split up the text into small pieces and only
 snippetize from the most relevant text. So, separate documents with a
 common group id for the document it came from. You might have to do 2
 queries to achieve what you want, but the second query for the same
 query will be blindingly fast. Often 1ms.
 
 Good luck!
 
 Lance
 
 On Sat, Jul 31, 2010 at 1:12 PM, Peter Spam ps...@mac.com wrote:
 However, I do need to search the entire document, or else the 
 highlighting will sometimes be blank :-(
 Thanks!
 
 - Peter
 
 ps. sorry for the many responses - I'm rushing around trying to get 
 this working.
 
 On Jul 31, 2010, at 1:11 PM, Peter Spam wrote:
 
 Correction - it went from 17 seconds to 10 seconds - I was changing 
 the hl.regex.maxAnalyzedChars the first time.
 Thanks!
 
 -Peter
 
 On Jul 31, 2010, at 1:06 PM, Peter Spam wrote:
 
 On Jul 30, 2010, at 1:16 PM, Peter Karich wrote:
 
 did you already try other values for hl.maxAnalyzedChars=2147483647
 
 Yes, I tried dropping it down to 21, but it didn't have much of an 
 impact (one search I just tried went from 17 seconds to 15.8 
 seconds, and this is an 8-core Mac Pro with 6GB RAM - 4GB for java).
 
 ? Also regular expression highlighting is more expensive, I think.
 What does the 'fuzzy' variable mean? If you use this to query via
 ~someTerm instead someTerm
 then you should try the trunk of solr which is a lot faster for 
 fuzzy or
 other wildcard search.
 
 fuzzy could be set to * but isn't right now.
 
 Thanks for the tips, Peter - this has been very frustrating!
 
 
 - Peter
 
 Regards,
 Peter.
 
 Data set: About 4,000 log files (will eventually grow to 
 millions).  Average log file is 850k.  Largest log file (so far) 
 is about 70MB.
 
 Problem: When I search for common terms, the query time goes from 
 under 2-3 seconds to about 60 seconds.  TermVectors etc are 
 enabled.  When I disable highlighting, performance improves a lot, 
 but is still slow for some queries (7 seconds).  Thanks in advance 
 for any ideas!
 
 
 -Peter
 
 
 -
 
 4GB RAM server
 % java -Xms2048M -Xmx3072M -jar start.jar
 
 -
 
 schema.xml changes:

RE: Faceting by fields that contain special characters

2010-08-19 Thread Markus Jelsma
A very common issue, you need to facet on a non-analyzed field.


http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-td1023699.html#a1222961
 
-Original message-
From: Christos Constantinou ch...@simpleweb.co.uk
Sent: Thu 19-08-2010 15:08
To: solr-user@lucene.apache.org; 
Subject: Faceting by fields that contain special characters

Hi all,

I am doing a faceted search on a solr field that contains URLs, for the sole 
purpose of trying to locate duplicate URLs in my documents.

However, the solr response I get looks like this:
public 'com' = int 492198
         public 'flickr' = int 492198
         public 'http' = int 492198
         public 'www' = int 253881
         public 'photo' = int 253843
         public 'n' = int 253318
         public 'httpwwwflickrcomphoto' = int 253316
         public 'farm' = int 238317
         public 'httpfarm' = int 238317
         public 'jpg' = int 238317
         public 'static' = int 238317
         public 'staticflickrcom' = int 238317
         public '5' = int 237939
         public '00' = int 61009
         public 'b' = int 59463
         public 'c' = int 59094
         public 'f' = int 59004
         public 'd' = int 58995
         public 'e' = int 58818
         public 'a' = int 58327
         public '08' = int 33797
         public '06' = int 33341
         public '04' = int 29902
         public '02' = int 29224
         public '2' = int 26671
         public '4' = int 26613
         public '6' = int 26606
         public '03' = int 26506
         public '1' = int 26389
         public '8' = int 26384
It should instead have the entire URL as the variable name, but the name is 
only a part of the URL. Is this because characters like :// in http:// cannot 
be used in variable names? If so, is there any workaround to the problem or an 
alternative way to detect duplicates?

Thanks

Christos



RE: Showing results based on facet selection

2010-08-19 Thread Markus Jelsma
Hi,

 

A facet query serves a different purpose [1]. You need to filter your result 
set [2]. And don't forget to follow the links on caching and such.

 

[1]: 
http://wiki.apache.org/solr/SimpleFacetParameters#facet.query_:_Arbitrary_Query_Faceting

[2]: http://wiki.apache.org/solr/CommonQueryParameters#fq
 

Cheers, 
-Original message-
From: PeterKerk vettepa...@hotmail.com
Sent: Thu 19-08-2010 14:10
To: solr-user@lucene.apache.org; 
Subject: Showing results based on facet selection


I have indexed all data (as can be seen below).

But now I want to be able to simulate when a user clicks on a facet value,
for example clicks on the value Gemeentehuis of facet themes_raw AND has
a selection on features facet on value Strand

I've been playing with facet.query function:
facet.query=themes_raw:Gemeentehuisfacet.query=features_raw:Strand

But without luck.

{
responseHeader:{
 status:0,
 QTime:0,
 params:{
facet:true,
fl:id,title,city,score,themes,features,official,services,
indent:on,
q:*:*,
facet.field:[province_raw,
services_raw,
themes_raw,
features_raw],
wt:json}},
response:{numFound:3,start:0,maxScore:1.0,docs:[
{
id:1,
title:Gemeentehuis Nijmegen,
services:[
 Fotoreportage],
features:[
 Tuin,
 Cafe],
themes:[
 Gemeentehuis],
score:1.0},
{
id:2,
title:Gemeentehuis Utrecht,
services:[
 Fotoreportage,
 Exclusieve huur],
features:[
 Tuin,
 Cafe,
 Danszaal],
themes:[
 Gemeentehuis,
 Strand  Zee],
score:1.0},
{
id:3,
title:Beachclub Vroeger,
services:[
 Exclusieve huur,
 Live muziek],
features:[
 Strand,
 Cafe,
 Danszaal],
themes:[
 Strand  Zee],
score:1.0}]
},
facet_counts:{
 facet_queries:{},
 facet_fields:{
province_raw:[
Gelderland,1,
Utrecht,1,
Zuid-Holland,1],
services_raw:[
Exclusieve huur,2,
Fotoreportage,2,
Live muziek,1],
themes_raw:[
Gemeentehuis,2,
Strand  Zee,2],
features_raw:[
Cafe,3,
Danszaal,2,
Tuin,2,
Strand,1]},
 facet_dates:{}}}

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Showing-results-based-on-facet-selection-tp1223362p1223362.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr for multiple websites

2010-08-19 Thread Markus Jelsma
http://osdir.com/ml/solr-user.lucene.apache.org/2009-09/msg00630.html

http://osdir.com/ml/solr-user.lucene.apache.org/2009-03/msg00309.html

 

Load balancing is bit out of scope here but all you need is a simple HTTP load 
balancer and a replication mechanism, depending on your set up.
 
-Original message-
From: Hitendra Molleti hitendra.moll...@itp.com
Sent: Thu 19-08-2010 14:38
To: solr-user@lucene.apache.org; 
CC: 'Jonathan DeMello' jonathan.deme...@itp.com; amer.mahf...@itp.com; 
'Nishchint Yogishwar' nishchint.yogish...@itp.com; 
Subject: RE: Solr for multiple websites

Thanks Girjesh.

Can you please let me know what are the pros and cons of this apporoach.

Also, how can we setup load balancing between multiple solrs

Thanks

Hitendra 

-Original Message-
From: Grijesh.singh [mailto:pintu.grij...@gmail.com] 
Sent: Thursday, August 19, 2010 10:25 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr for multiple websites


Using multicore is the right approach 
-- 
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-for-multiple-websites-tp1173220p1219
772.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Autosuggest on PART of cityname

2010-08-19 Thread Markus Jelsma
You need a new analyzed field with the EdgeNGramTokenizer or you can try 
facet.prefix for this to work. To retrieve the number of locations for that 
city, just use the results from the faceting engine as usual.

 

I'm unsure which approach is actually faster but i'd guess using the 
EdgeNGramTokenizer is faster, but also takes up more disk space. Using the 
faceting engine will not take more disk space.
 
-Original message-
From: PeterKerk vettepa...@hotmail.com
Sent: Thu 19-08-2010 16:46
To: solr-user@lucene.apache.org; 
Subject: Autosuggest on PART of cityname


I want to have a Google-like autosuggest function on citynames. So when user
types some characters I want to show cities that match those characters but
ALSO the amount of locations that are in that city.

Now with Solr I now have the parameter:
fq=title:Bost

But the result doesnt show the city Boston. So the fq parameter now seems to
be an exact match, where I want it to be a partial match as well, more like
this in SQL: WHERE title LIKE 'value%'

How can I do this?



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1226088.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Autosuggest on PART of cityname

2010-08-19 Thread Markus Jelsma
Hmm, you have only four documents in your index i guess? That would make sense 
because you query for *:*. This technique doesn't rely on the found documents 
but the faceting engine so you should include rows=0 in your query and the fl 
parameter is not required anymore. Also, add facet=true to enable the faceting 
engine.

 

http://localhost:8983/solr/db/select/?wt=jsonq=*:*rows=0facet=truefacet.field=cityfacet.prefix=bost


 
-Original message-
From: PeterKerk vettepa...@hotmail.com
Sent: Thu 19-08-2010 17:11
To: solr-user@lucene.apache.org; 
Subject: RE: Autosuggest on PART of cityname


Ok, I now tried this:
http://localhost:8983/solr/db/select/?wt=jsonindent=onq=*:*fl=cityfacet.field=cityfacet.prefix=Bost

Then I get:
{
responseHeader:{
 status:0,
 QTime:0,
 params:{
fl:city,
indent:on,
q:*:*,
facet.prefix:Bost,
facet.field:city,
wt:json}},
response:{numFound:4,start:0,docs:[
{},
{},
{},
{}]
}}


So 4 total results, but I would have expected 1

What am I doing wrong?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1226571.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Autosuggest on PART of cityname

2010-08-20 Thread Markus Jelsma
You can't, it's analyzed. And if you facet on a non-analyzed field, you cannot 
distinguish between upper- and lowercase tokens. If you want that, you must 
create a new field with an EdgeNGramTokenizer, search on it and then you can 
facet on a non-analyzed field. Your query will be a bit different then:

 

q=new_ngram_field:utr

rows=0

facet=true

facet.field=non_analyzed_city_field

 

 
-Original message-
From: PeterKerk vettepa...@hotmail.com
Sent: Fri 20-08-2010 12:36
To: solr-user@lucene.apache.org; 
Subject: RE: Autosuggest on PART of cityname


Ok, I now do this (searching for utr in cityname):
http://localhost:8983/solr/db/select/?wt=jsonindent=onq=*:*rows=0facet=truefacet.field=cityfacet.prefix=utr

In the DB there's 1 location with cityname 'Utrecht' and the other 1 is with
'Utrecht Overvecht'

So in my dropdown I would like:
Utrecht (1)
Utrecht Overvecht (1)

But I get this:
{
responseHeader:{
 status:0,
 QTime:0,
 params:{
facet:true,
indent:on,
q:*:*,
facet.prefix:utr,
facet.field:city,
wt:json,
rows:0}},
response:{numFound:6,start:0,docs:[]
},
facet_counts:{
 facet_queries:{},
 facet_fields:{
city:[
utrecht,2,
utrechtovervecht,1]},
 facet_dates:{}}}

As you can see it looks at field city, where the tokenizer looks at each
individual word. I also tried city_raw, but that was without any results.

How can I fix that my dropdown will show the correct values?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1241444.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Document Section in Solr

2010-08-27 Thread Markus Jelsma
You cannot divide a document into sections as far as i know. You could, 
however, store divisions in different fields, if your use-case allows this, 
and retrieve only the fields that you need. This way you can avoid downloading 
20MiB at once.

On Friday 27 August 2010 11:26:05 maheshkumar wrote:
 If the document which is indexed is the big file. Is there are provision of
 dividing the documents into sections.
 For eg., 20MB file divided into 10 sections which will show the right
 section when searched. 
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Status of Solr in the cloud?

2010-08-27 Thread Markus Jelsma
That would be Solr 4.0, or maybe 3.1 first.

http://wiki.apache.org/solr/Solr3.1
http://wiki.apache.org/solr/Solr4.0


On Thursday 26 August 2010 23:58:25 Charlie Jackson wrote:
 There seem to be a few parallel efforts at putting Solr in a cloud
 configuration. See http://wiki.apache.org/solr/KattaIntegration, which
 is based off of https://issues.apache.org/jira/browse/SOLR-1395. Also
 http://wiki.apache.org/solr/SolrCloud which is
 https://issues.apache.org/jira/browse/SOLR-1873. And another JIRA:
 https://issues.apache.org/jira/browse/SOLR-1301.
 
 
 
 These all seem aimed at the same goal, correct? I'm interested in
 evaluating one of these solutions for my company; which is the most
 stable or most likely to eventually be part of the Solr distribution?
 
 
 
 
 
 Thanks,
 
 Charlie
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Auto ID for Documents indexed

2010-08-27 Thread Markus Jelsma
No. Solr doesn't require a unique ID nor is an auto incrementing value really 
useful in indices spanning multiple machines. Maybe SOLR-308 could help you 
out but then the question remains, why would you need a feature like this?

https://issues.apache.org/jira/browse/SOLR-308



On Friday 27 August 2010 11:41:55 maheshkumar wrote:
 Is there feature to provide an auto-increment id to the document which is
 getting indexed.
 This is the schema file
 field name=reference type=string indexed=true stored=true
 required=true/
 field name=id type=string indexed=true stored=true/
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Multiple passes with WordDelimiterFilterFactory

2010-08-27 Thread Markus Jelsma
It's just a configured filter, so you should be able to define it twice. Have 
you tried it? But it might be tricky, the output from the first will be the 
input of the second so i doubt the usefulness of this approach.


On Thursday 26 August 2010 17:45:45 Shawn Heisey wrote:
   Can I pass my data through WordDelimiterFilterFactory more than once?
 It occurs to me that I might get better results if I can do some of the
 filters separately and use preserveOriginal on some of them but not others.
 
 Currently I am using the following definition on both indexing and
 querying.  Would it make sense to do the two differently?
 
 filter class=solr.WordDelimiterFilterFactory
splitOnCaseChange=1
splitOnNumerics=1
stemEnglishPossessive=1
generateWordParts=1
generateNumberParts=1
catenateWords=1
catenateNumbers=1
catenateAll=0
preserveOriginal=1
 /
 
 Thanks,
 Shawn
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: A few query issues with solr

2010-08-27 Thread Markus Jelsma
For solving the car/car-rent issue you'll need to add a SynonymFilter to your 
analyzer chain and configure it accordingly.

On Friday 27 August 2010 13:40:15 hemantverm...@gmail.com wrote:
 this link will help you:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimi
 terFilterFactory
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Problem related to Sorting in Solr1.4

2010-08-27 Thread Markus Jelsma
What seems to be the problem? Did you consult the wiki on this matter?

http://wiki.apache.org/solr/CommonQueryParameters#sort


On Friday 27 August 2010 15:14:06 deepak agrawal wrote:
 Hi,
 
 I have one Text fileld in our schema i want to do the sorting for that
  column.
 
 field name=TITLE type=text indexed=true stored=true /
 field name=UPDBY type=text indexed=true stored=true /
 
 
 I have these two columns i want to use the SORT for these two columns.
 any one can please suggest what should i need to do for that.
 I am currently using Solr1.4.
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



RE: Spellcheck in multilanguage search

2010-08-31 Thread Markus Jelsma
Configure language specific fields and spellcheckers just as you would for a 
single language index, so multiple content_LANG fields and spell_LANG field. 
This will, of course, only work if you know in what language the search 
operates.

 
-Original message-
From: Grijesh.singh pintu.grij...@gmail.com
Sent: Tue 31-08-2010 12:18
To: solr-user@lucene.apache.org; 
Subject: Spellcheck in multilanguage search


How can be spellcheck configured for multilanguage search,I have to index 17
languages in my indexes and search on them also wants to use spellcheck for
that
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-in-multilanguage-search-tp1393357p1393357.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Memcache for Solr

2010-08-31 Thread Markus Jelsma
Hi,

 

In a restaurant index website, we have used Memcache only for storing the 
generated HTML facet list when q=*. This cached object was only used when no 
additional search parameters were specified. It was quite useful because the 
facet list was always present and only changed if real search parameters were 
specified.

 

We found it wasn't feasible to cache arbitrary result sets, there would be just 
too many result sets to cache which would probably never be reused anyway and 
there is the problem  of invalidating cached result sets. I'd rather rely on 
Solr's filter cache instead.

 

From that point of view, it's only feasible to cache generated objects (HTML or 
whatever format) that you know are being requested many times. It's easy to 
implement and doesn't take too much memory that won't be reused anyway.

 

Cheers,
 
-Original message-
From: Hitendra Molleti hitendra.moll...@itp.com
Sent: Tue 31-08-2010 16:38
To: solr-user@lucene.apache.org; 
Subject: Memcache for Solr

Hi,

We were looking at implementing Memcache for Solr.

Can someone who has already implemented this let us know if it is a good
option to go for i.e. how effective is using memcache compared to Solr's
internal cache. 

Also, are there any down sides to it and difficult to implement.

Thanks

Hitendra




Re: Proximity search + Highlighting

2010-09-01 Thread Markus Jelsma
I think you need to enable usePhraseHighlighter in order to use the 
highlightMultiTerm parameter.

 On Wednesday 01 September 2010 12:12:11 Xavier Schepler wrote:
 Hi,
 
 can the highlighting component highlight terms only if the distance
 between them matches the query ?
 I use those parameters :
 
 hl=onhl.fl=qFR,iFR,mFR,vlFRhl.usePhraseHighlighter=falsehl.highlightMult
 iTerm=truehl.simple.pre=bhl.simple.post=%2Fbhl.mergeContiguous=false
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: shingles work in analyzer but not real data

2010-09-01 Thread Markus Jelsma
If your use-case is limited to this, why don't you encapsulate all queries in 
double quotes? 

On Wednesday 01 September 2010 14:21:47 Jeff Rose wrote:
 Hi,
   We are using SOLR to match query strings with a keyword database, where
 some of the keywords are actually more than one word.  For example a
  keyword might be apple pie and we only want it to match for a query
  containing that word pair, but not one only containing apple.  Here is
  the relevant piece of the schema.xml, defining the index and query
  pipelines:
 
   fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.PatternTokenizerFactory pattern=;/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.TrimFilterFactory /
  /analyzer
  analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.TrimFilterFactory /
 filter class=solr.ShingleFilterFactory /
   /analyzer
/fieldType
 
 In the analysis tool this schema looks like it works correctly.  Our
 multi-word keywords are indexed as a single entry, and then when a search
 phrase contains one of these multi-word keywords it is shingled and
  matched. Unfortunately, when we do the same queries on top of the actual
  index it responds with zero matches.  I can see in the index histogram
  that the terms are correctly indexed from our mysql datasource containing
  the keywords, but somehow the shingling doesn't appear to work on this
  live data.  Does anyone have experience with shingling that might have
  some tips for us, or otherwise advice for debugging the issue?
 
 Thanks,
 Jeff
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: morelikethis - stored=true is necessary?

2010-09-02 Thread Markus Jelsma
The following table [1] will be most helpful! Keep it referenced!

[1]: http://wiki.apache.org/solr/FieldOptionsByUseCase

On Thursday 02 September 2010 13:20:33 zqzuk wrote:
 Hi all
 I am learning to use morelikethis handler, which seems very straightforward
 but I got some problems when testing and I wonder if you could help me.
 
 In my schema I have
 
 field name=page_content type=text indexed=true stored=false
 required=false multiValued=false termVectors=true/
 
 With this schema when I use the query parameter
 mlt.fl=page_content
 
 The returned XML results in the moreLiksThis section shows similarity
 scores of 0 for all documents. However it is not the case for fields that
 define stored=true. Does it mean I must set stored=true for MLT to
  work?
 
 Also, does multivalued has an effect on the result?
 
 
 Thanks!
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: How to retrieve the full corpus

2010-09-06 Thread Markus Jelsma
You can use Luke to inspect a Lucene index. Check the schema browser in your 
Solr admin interface for an example.

On Monday 06 September 2010 16:52:03 Roland Villemoes wrote:
 Hi All,
 
 How can I retrieve all words from a Solr core?
 I need a list of all the words and how often they occur in the index.
 
 med venlig hilsen/best regards
 
 Roland Villemoes
 Tel: (+45) 22 69 59 62
 E-Mail: mailto:r...@alpha-solutions.dk
 
 Alpha Solutions A/S
 Borgergade 2, 3.sal, 1300 København K
 Tel: (+45) 70 20 65 38
 Web: http://www.alpha-solutions.dkhttp://www.alpha-solutions.dk/
 
 ** This message including any attachments may contain confidential and/or
  privileged information intended only for the person or entity to which it
  is addressed. If you are not the intended recipient you should delete this
  message. Any printing, copying, distribution or other use of this message
  is strictly prohibited. If you have received this message in error, please
  notify the sender immediately by telephone, or e-mail and delete all
  copies of this message and any attachments from your system. Thank you.
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



RE: getting started - books/in dept material

2010-09-06 Thread Markus Jelsma
Did you miss the wiki?

http://wiki.apache.org/solr/SolrResources


 
-Original message-
From: Dennis Gearon gear...@sbcglobal.net
Sent: Mon 06-09-2010 22:05
To: solr-user@lucene.apache.org; 
Subject: getting started - books/in dept material

I really don't want to understand the code that is IN Solr/Lucene.

So I'm looking for books on USING Solr/Lucene and configuring it plus making 
good queries.

Any suggestions for current material?


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
 otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


RE: Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)

2010-09-06 Thread Markus Jelsma
The remainder of an arithmetic division

http://en.wikipedia.org/wiki/Modulo_operation
-Original message-
From: Dennis Gearon gear...@sbcglobal.net
Sent: Mon 06-09-2010 22:04
To: solr-user@lucene.apache.org; 
Subject: Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)

What is a 'simple MOD'?

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
 otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/6/10, Andrzej Bialecki a...@getopt.org wrote:

 From: Andrzej Bialecki a...@getopt.org
 Subject: Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)
 To: solr-user@lucene.apache.org
 Date: Monday, September 6, 2010, 11:30 AM
 On 2010-09-06 16:41, Yonik Seeley
 wrote:
  On Mon, Sep 6, 2010 at 10:18 AM, MitchKmitc...@web.de 
 wrote:
  [...consistent hashing...]
  But it doesn't solve the problem at all, correct
 me if I am wrong, but: If
  you add a new server, let's call him IP3-1, and
 IP3-1 is nearer to the
  current ressource X, than doc x will be indexed at
 IP3-1 - even if IP2-1
  holds the older version.
  Am I right?
  
  Right.  You still need code to handle migration.
  
  Consistent hashing is a way for everyone to be able to
 agree on the
  mapping, and for the mapping to change
 incrementally.  i.e. you add a
  node and it only changes the docid-node mapping of
 a limited percent
  of the mappings, rather than changing the mappings of
 potentially
  everything, as a simple MOD would do.
 
 Another strategy to avoid excessive reindexing is to keep
 splitting the largest shards, and then your mapping becomes
 a regular MOD plus a list of these additional splits.
 Really, there's an infinite number of ways you could
 implement this...
 
  
  For SolrCloud, I don't think we'll end up using
 consistent hashing -
  we don't need it (although some of the concepts may
 still be useful).
 
 I imagine there could be situations where a simple MOD
 won't do ;) so I think it would be good to hide this
 strategy behind an interface/abstract class. It costs
 nothing, and gives you flexibility in how you implement this
 mapping.
 
 -- Best regards,
 Andrzej Bialecki     
  ___. ___ ___ ___ _
 _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic
 Web
 ___|||__||  \|  ||  |  Embedded Unix,
 System Integration
 http://www.sigram.com  Contact: info at sigram dot
 com
 
 


Re: Nutch/Solr

2010-09-07 Thread Markus Jelsma
Depends on your version of Nutch. At least trunk and 1.1 obey the 
solrmapping.xml file in Nutch' configuration directory. I'd suggest you start 
with that mapping file and the Solr schema.xml file shipped with Nutch as it 
exactly matches with the mapping file.

Just restart Solr with the new schema (or you change the mapping), crawl, 
fetch, parse and update your DB's and then push the index from Nutch to your 
Solr instance.


On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote:
 I tried to combine nutch and solr, want to ask somethig.
 
 After crawling, nutch has certain fields such as; content, tstamp, title.
 
 How can I map content field after crawling ? Do I have change the lucene
 code (such as add extra field)?
 
 Or overcome in solr stage?
 
 Any suggestion?
 
 Thx.
 --
 
 Yavuz Selim YILMAZ
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Nutch/Solr

2010-09-07 Thread Markus Jelsma

You should:
- definately upgrade to 1.1 (1.2 is on the way), and
- subscribe to the Nutch mailing list for Nutch specific questions. 


On Tuesday 07 September 2010 10:36:58 Yavuz Selim YILMAZ wrote:
 In fact, I used nutch 0.9 version, but thinking of passing the new version.
 
 If anybody did something like that, ? want to learn their experience.
 
 If indexing an xml file, there are specific fields and all of them are
 dependent among them, so duplicates don't happen.
 
 I want to extract specific fields from the content field. Doing such
 extraction, new fileds should be indexed as well, then comes me that,
 content indexed twice for every new field.
 
 By the way, any details about how to get new fields from the content will
  be helpful.
 --
 
 Yavuz Selim YILMAZ
 
 
 2010/9/7 Markus Jelsma markus.jel...@buyways.nl
 
  Depends on your version of Nutch. At least trunk and 1.1 obey the
  solrmapping.xml file in Nutch' configuration directory. I'd suggest you
  start
  with that mapping file and the Solr schema.xml file shipped with Nutch as
  it
  exactly matches with the mapping file.
 
  Just restart Solr with the new schema (or you change the mapping), crawl,
  fetch, parse and update your DB's and then push the index from Nutch to
  your
  Solr instance.
 
  On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote:
   I tried to combine nutch and solr, want to ask somethig.
  
   After crawling, nutch has certain fields such as; content, tstamp,
   title.
  
   How can I map content field after crawling ? Do I have change the
 
  lucene
 
   code (such as add extra field)?
  
   Or overcome in solr stage?
  
   Any suggestion?
  
   Thx.
   --
  
   Yavuz Selim YILMAZ
 
  Markus Jelsma - Technisch Architect - Buyways BV
  http://www.linkedin.com/in/markus17
  050-8536620 / 06-50258350
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



RE: Is there a way to fetch the complete list of data from a particular column in SOLR document?

2010-09-07 Thread Markus Jelsma
q=*:*fl=id_FIELDrows=NUM_DOCS ?
 
-Original message-
From: bbarani bbar...@gmail.com
Sent: Tue 07-09-2010 23:09
To: solr-user@lucene.apache.org; 
Subject: Is there a way to fetch the complete list of data from a particular 
column in SOLR document?


Hi,

I am trying to get complete list of unique document ID and compare it with
that of back end to make sure that both back end and SOLR documents are in
sync.

Is there a way to fetch the complete list of data from a particular column
in SOLR document?

Once I get the list, I can easily compare it against the DB and delete the
orphan documents.. 

Please let me know if there are any other ideas / suggestions to implement
this.

Thanks,
Barani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-fetch-the-complete-list-of-data-from-a-particular-column-in-SOLR-document-tp1435586p1435586.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Re: MoreLikethis and fq not giving exact results ?

2010-09-07 Thread Markus Jelsma
I can think of two useful cases for a feature that limits MLT results depending 
with an optional mlt.fq parameter that limits the MLT results for each 
document, based on that fq:

 

1. prevent irrelevant docs when in a deep faceted navigation

2. general search results with MLT where you need to distinguish between 
collections when there are many different collections sharing the same index

 


 
-Original message-
From: Chris Hostetter hossman_luc...@fucit.org
Sent: Tue 07-09-2010 23:32
To: solr-user@lucene.apache.org; 
Subject: Re: MoreLikethis and fq not giving exact results ?

I don't believe the MLT Component has anyway of filtering like this.  In 
your case you want the fq params to apply to the MLT results as well as 
the main results, but in other cases people wantthe fq to apply to the 
main result set and let the MLT be per individual doc with no ohter 
filters -- no one has implemented a configurable way to say when/if 
certain fqs should apply in the way you describe.


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss      ...  Stump The Chump!


 


RE: Re: MoreLikethis and fq not giving exact results ?

2010-09-07 Thread Markus Jelsma
I know =)

 

I was just polling votes for a feature request - there is no such issue filed 
for this component. Perhaps there should be?
 
-Original message-
From: Chris Hostetter hossman_luc...@fucit.org
Sent: Wed 08-09-2010 00:13
To: solr-user@lucene.apache.org; 
Subject: RE: Re: MoreLikethis and fq not giving exact results ?

i don't disagree with you -- i was just commenting that it doesn't work 
that way at the moment, because it was designed with differnet use cases 
in mind (returning docs related to the result docs, independent of how you 
found those result docs)


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss      ...  Stump The Chump!



Invariants on a specific fq value

2010-09-08 Thread Markus Jelsma
Hi,

I have an index with several collections. Every document has a collection 
field that specifies the collection it belongs to. To make querying easier 
(and restrict exposed parameters) i have a request handler for each 
collection. The request handlers are largely the same and preset all 
parameters using invariants.

Well, this is all very nice. But there is a catch, i cannot make an invariant 
of the fq parameter because it's being used (from the outside) to navigate 
through the facets. This means that the outside world can specify any value 
for the fq parameter.

With the fq parameter being exposed, it is possible for request handler X to 
query documents that belong to collection Y and vice versa. But, as you might 
guess by now, request handler X should only be allowed to retrieve documents 
that belong to collection X.

I know there are some discussions on how to restrict users to certain 
documents but i'd like to know if it is doable to patch the request handler 
logic to add an invariant-like directive that allows me to restrict a certain 
value for a certain parameter, but allow different values for that parameters.

To give an example:

requestHandler name=collection_x
lst name=invariants
str name=defTypedismax/str
... More invariants here
/lst

lst name=what_should_we_call_this?
str name=fqfieldName:collection_x/str
/lst
/requestHandler

The above configuration won't allow to change the defType and won't allow a 
value to be specified for the fieldName through the fq parameter. It will 
allow the outside worls to specify a value on another field through the fq 
parameter such as : fq:anotherField:someValue.

Any ideas? 


Cheers,

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



RE: How to import data with a different date format

2010-09-08 Thread Markus Jelsma
No. The Datefield [1] will not accept it any other way. You could, however, 
fool your boss and dump your dates in an ordinary string field. But then you 
cannot use some of the nice date features.

 

[1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html
 
-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 17:36
To: solr-user@lucene.apache.org; 
Subject: How to import data with a different date format

Hi,

I am attempting to import some of our data into SOLR. I did it the quickest way 
I know because I literally only have 2 days to import the data and do some 
queries for a proof-of-concept.

So I have this data in XML format and I wrote a short XSLT script to convert it 
to the format in solr/example/exampledocs (except I retained the element names 
so I had to modify schema.xml in the conf directory. So far so good -- the 
import works and I can search the data. One of my immediate problems is that 
there is a date field with the format MM/DD/. Looking at schema.xml, it 
seems SOLR accepts only full date fields -- everything seems to be mandatory 
including the Z for Zulu/UTC time according to the doc. Is there a way to 
specify the date format?

Thanks very much.
Rico



RE: Re: How to import data with a different date format

2010-09-08 Thread Markus Jelsma
Your format (MM/DD/) is not compatible. 
 
-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 19:03
To: solr-user@lucene.apache.org; 
Subject: Re: How to import data with a different date format

That was my first thought :-) But it would be nice to be able to do date 
queries. I guess when I export the data I can just add 00:00:00Z.

Thanks.


- Original Message 
From: Markus Jelsma markus.jel...@buyways.nl
To: solr-user@lucene.apache.org
Sent: Wed, September 8, 2010 11:34:32 AM
Subject: RE: How to import data with a different date format

No. The Datefield [1] will not accept it any other way. You could, however, 
fool 
your boss and dump your dates in an ordinary string field. But then you cannot 
use some of the nice date features.



[1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html 

-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 17:36
To: solr-user@lucene.apache.org; 
Subject: How to import data with a different date format

Hi,

I am attempting to import some of our data into SOLR. I did it the quickest way 
I know because I literally only have 2 days to import the data and do some 
queries for a proof-of-concept.

So I have this data in XML format and I wrote a short XSLT script to convert it 
to the format in solr/example/exampledocs (except I retained the element names 
so I had to modify schema.xml in the conf directory. So far so good -- the 
import works and I can search the data. One of my immediate problems is that 
there is a date field with the format MM/DD/. Looking at schema.xml, it 
seems SOLR accepts only full date fields -- everything seems to be mandatory 
including the Z for Zulu/UTC time according to the doc. Is there a way to 
specify the date format?

Thanks very much.
Rico

 


RE: Re: How to import data with a different date format

2010-09-08 Thread Markus Jelsma
Ah, that answers Erick's question. And mine ;) 
 
-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 19:25
To: solr-user@lucene.apache.org; 
Subject: Re: How to import data with a different date format

I'm going with option 1, converting MM/DD/ to -MM-DD (which is fairly 
easy in XSLT) and then adding T00:00:00Z to it.

Thanks.



- Original Message 
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, September 8, 2010 12:09:55 PM
Subject: Re: How to import data with a different date format

I think Markus is spot-on given the fact that you have 2 days. Using a
string field is quickest.

However, if you absolutely MUST have functioning dates, there are three
options I can think of:
1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant
2 use DIH and DateTransformer, see:
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
     you can walk a directory importing all the XML files with
FileDataSource.
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you
could write a program to do this manually.

But given the time constraints, I suspect your time would be better spent
doing the other stuff and just using string as per Markus. I have no clue
how SOLR-savvy you are, so pardon if this is something you already know. But
lots of people trip up over the string field type, which is NOT tokenized.
You usually want text unless it's some sort of ID So it might be worth
it to do some searching earlier rather than later G

Best
Erick

On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote:

 No. The Datefield [1] will not accept it any other way. You could, however,
 fool your boss and dump your dates in an ordinary string field. But then you
 cannot use some of the nice date features.



 [1]:
 http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

 -Original message-
 From: Rico Lelina rlel...@yahoo.com
 Sent: Wed 08-09-2010 17:36
 To: solr-user@lucene.apache.org;
 Subject: How to import data with a different date format

 Hi,

 I am attempting to import some of our data into SOLR. I did it the quickest
 way
 I know because I literally only have 2 days to import the data and do some
 queries for a proof-of-concept.

 So I have this data in XML format and I wrote a short XSLT script to
 convert it
 to the format in solr/example/exampledocs (except I retained the element
 names
 so I had to modify schema.xml in the conf directory. So far so good -- the
 import works and I can search the data. One of my immediate problems is
 that
 there is a date field with the format MM/DD/. Looking at schema.xml, it
 seems SOLR accepts only full date fields -- everything seems to be
 mandatory
 including the Z for Zulu/UTC time according to the doc. Is there a way to
 specify the date format?

 Thanks very much.
 Rico





RE: Re: Invariants on a specific fq value

2010-09-08 Thread Markus Jelsma
Interesting! I haven't met the appends method before and i'll be sure to give 
it a try tomorrow. Try, the wiki [1] is not very clear on what it really does.

 

More suggestions before tomorrow?

 

[1]: http://wiki.apache.org/solr/SolrSecurity#Path_Based_Authentication
 
-Original message-
From: Jonathan Rochkind rochk...@jhu.edu
Sent: Wed 08-09-2010 19:19
To: solr-user@lucene.apache.org; markus.jel...@buyways.nl; 
Subject: Re: Invariants on a specific fq value

I just found out about 'invariants', and I found out about another thing 
too: appends.   (I don't think either of these are actually documented 
anywhere?).

I think maybe appends rather than invariants, with your fq you want 
always to be there might be exactly what you want?

I actually forget whether it's append or appends, and am not sure if 
it's documented anywhere, try both I guess. But apparently it does exist 
in 1.4.

Jonathan

Markus Jelsma wrote:
 Hi,

 I have an index with several collections. Every document has a collection 
 field that specifies the collection it belongs to. To make querying easier 
 (and restrict exposed parameters) i have a request handler for each 
 collection. The request handlers are largely the same and preset all 
 parameters using invariants.

 Well, this is all very nice. But there is a catch, i cannot make an invariant 
 of the fq parameter because it's being used (from the outside) to navigate 
 through the facets. This means that the outside world can specify any value 
 for the fq parameter.

 With the fq parameter being exposed, it is possible for request handler X to 
 query documents that belong to collection Y and vice versa. But, as you might 
 guess by now, request handler X should only be allowed to retrieve documents 
 that belong to collection X.

 I know there are some discussions on how to restrict users to certain 
 documents but i'd like to know if it is doable to patch the request handler 
 logic to add an invariant-like directive that allows me to restrict a certain 
 value for a certain parameter, but allow different values for that parameters.

 To give an example:

 requestHandler name=collection_x
 lst name=invariants
 str name=defTypedismax/str
 ... More invariants here
 /lst

 lst name=what_should_we_call_this?
 str name=fqfieldName:collection_x/str
 /lst
 /requestHandler

 The above configuration won't allow to change the defType and won't allow a 
 value to be specified for the fieldName through the fq parameter. It will 
 allow the outside worls to specify a value on another field through the fq 
 parameter such as : fq:anotherField:someValue.

 Any ideas? 


 Cheers,

 Markus Jelsma - Technisch Architect - Buyways BV
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350

   

 


RE: Re: Re: Invariants on a specific fq value

2010-09-08 Thread Markus Jelsma
Sounds great! I'll be very sure to put it to the test tomorrow and perhaps add 
documentation on these types to the solrconfigxml wiki page for reference.

 


 
-Original message-
From: Yonik Seeley yo...@lucidimagination.com
Sent: Wed 08-09-2010 19:38
To: solr-user@lucene.apache.org; 
Subject: Re: Re: Invariants on a specific fq value

2010 at 1:32 PM, Markus Jelsma markus.jel...@buyways.nl wrote:
 Interesting! I haven't met the appends method before and i'll be sure to give 
 it a try tomorrow. Try, the wiki [1] is not very clear on what it really does.

Here's a comment from the example solrconfig.xml:

   !-- In addition to defaults, appends params can be specified
        to identify values which should be appended to the list of
        multi-val params from the query (or the existing defaults).

        In this example, the param fq=instock:true will be appended to
        any query time fq params the user may specify, as a mechanism for
        partitioning the index, independent of any user selected filtering
        that may also be desired (perhaps as a result of faceted searching).

        NOTE: there is *absolutely* nothing a client can do to prevent these
        appends values from being used, so don't use this mechanism
        unless you are sure you always want it.
     --

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8


RE: Re: Re: Invariants on a specific fq value

2010-09-08 Thread Markus Jelsma
Excellent! You already made my day for tomorrow! I'll check it's behavior with 
fq parameters specifying the a filter for the same field!
-Original message-
From: Chris Hostetter hossman_luc...@fucit.org
Sent: Wed 08-09-2010 21:04
To: solr-user@lucene.apache.org; 
Subject: RE: Re: Re: Invariants on a specific fq value


: Sounds great! I'll be very sure to put it to the test tomorrow and 
: perhaps add documentation on these types to the solrconfigxml wiki page 
: for reference.

SolrConfigXml wouldn't really be an appropriate place to document this 
-- it's not a general config item, it's a feature of the SearchHandler...

  http://wiki.apache.org/solr/SearchHandler

That wiki page already documented defaults, i've updated it to add 
details on appends and invariants.


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss      ...  Stump The Chump!



RE: svn branch issues

2010-09-09 Thread Markus Jelsma
 http://svn.apache.org/repos/asf/lucene/dev/branches/
 
-Original message-
From: Mark Allan mark.al...@ed.ac.uk
Sent: Thu 09-09-2010 10:44
To: solr-user@lucene.apache.org; 
Subject: svn branch issues

Hi all,

As I've mentioned in the past, I've created some custom field types  
which make use of the AbstractSubTypeFieldType class in the current  
trunk version of solr for a service we're working on.  We're getting  
close to putting our service into production (early 2011) and we're  
now looking for a stable version of Solr to use with these classes.   
Unfortunately, my field types don't compile against the current stable  
version (Solr 1.4) because of the missing AbstractSubTypeFieldType and  
other required classes.

Having had a look at JIRA to see the number of outstanding unresolved  
issues, I tried downloading the now defunct 1.5 branch on the  
assumption that it's more stable than the current trunk.  Whether or  
not that's a safe assumption remains to be seen!

Anyway, the problem is when I try to checkout the 1.5 branch, I get an  
error from subversion:

$ svn co http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.5-dev
svn: Repository moved permanently to '/viewvc/lucene/solr/branches/ 
branch-1.5-dev/'; please relocate

Going to http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.5-dev 
 in a browser shows the web view and contents of that branch, so  
something's not right with the subversion server.

Anyone got any pointers please?

Alternatively, how stable is the current trunk? Does it have a long  
way to go before being released as a stable version?

Many thanks
Mark

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: svn branch issues

2010-09-09 Thread Markus Jelsma
Well, it's under heavy development but the 3.x branch is more likely to become 
released than 1.5.x, which is highly unlikely to be ever released.


On Thursday 09 September 2010 13:04:38 Mark Allan wrote:
 Thanks. Are you suggesting I use branch_3x and is that considered
 stable?
 Cheers
 Mark
 
 On 9 Sep 2010, at 10:47 am, Markus Jelsma wrote:
   http://svn.apache.org/repos/asf/lucene/dev/branches/
 
  -Original message-
  From: Mark Allan mark.al...@ed.ac.uk
  Sent: Thu 09-09-2010 10:44
  To: solr-user@lucene.apache.org;
  Subject: svn branch issues
 
  Hi all,
 
  As I've mentioned in the past, I've created some custom field types
  which make use of the AbstractSubTypeFieldType class in the current
  trunk version of solr for a service we're working on.  We're getting
  close to putting our service into production (early 2011) and we're
  now looking for a stable version of Solr to use with these classes.
  Unfortunately, my field types don't compile against the current stable
  version (Solr 1.4) because of the missing AbstractSubTypeFieldType and
  other required classes.
 
  Having had a look at JIRA to see the number of outstanding unresolved
  issues, I tried downloading the now defunct 1.5 branch on the
  assumption that it's more stable than the current trunk.  Whether or
  not that's a safe assumption remains to be seen!
 
  Anyway, the problem is when I try to checkout the 1.5 branch, I get an
  error from subversion:
 
  $ svn co http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.5-dev
  svn: Repository moved permanently to '/viewvc/lucene/solr/branches/
  branch-1.5-dev/'; please relocate
 
  Going to http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.5-dev
   in a browser shows the web view and contents of that branch, so
  something's not right with the subversion server.
 
  Anyone got any pointers please?
 
  Alternatively, how stable is the current trunk? Does it have a long
  way to go before being released as a stable version?
 
  Many thanks
  Mark
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Indexing checksum of field value

2010-09-09 Thread Markus Jelsma
Hi,

You can use an UpdateProcessor to do so. This can be used to deduplicate 
documents based on exact or near matches with fields in other documents. Check 
the wiki page on deduplication [1] for an example.

[1]: http://wiki.apache.org/solr/Deduplication

Cheers,

On Thursday 09 September 2010 13:44:55 Staffan wrote:
 Hi,
 
 I am looking for a way to store the checksum of a field's value, something
  like:
 
 field name=text...
 !-- the SHA1 checksum of text (before applying analyzer) --
 field name=text_sha1 type=checksum indexed=true stored=true
 ...
 copyField source=text dest=text_sha1
 
 I haven't found anything like that in the docs or on google. Did I
 miss something? If not, would a custom tokenizer be a good way to
 implement it?
 
 /Staffan
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Inconsistent search results with multiple keywords

2010-09-09 Thread Markus Jelsma
Looks like AND is your defaultOperator [1]. Check your schema.xml and try 
adding q.op=or to your query.

[1]: http://wiki.apache.org/solr/SearchHandler#q.op
 
On Thursday 09 September 2010 15:34:52 Stéphane Corlosquet wrote:
 Hi all,
 
 I'm new to solr so please let me know if there is a more appropriate place
 for my question below.
 
 I'm noticing a rather unexpected number of results when I add more keywords
 to a search. I'm listing below a example (where I replaced the real
  keywords with placeholders):
 
 keyword1 851 hits
 keyword1 keyword2  90 hits
 keyword1 keyword2 keyword3 269 hits
 keyword1 keyword2 keyword3 keyword4 47 hits
 
 As you can see, adding k2 narrows down the amount of results (as I would
 expect), but adding k3 to k1 and k2 suddenly increases the amount of
 results. with 4 keywords, the results have been narrowed down again. Would
 solr/lucene search algorithm with multiple keywords explain this non
 consistent behavior? I would think that adding more keywords would narrow
 down my results.
 
 I'm pasting below the relevant log in case it helps:
 
 INFO: [] webapp=/solr path=/select/
 params={spellcheck=truefacet=truefacet.mincount=1facet.limit=20spellche
 ck.q=keyword1+keyword2+keyword3+keyword4
  json.nl=mapwt=jsonversion=1.2rows=10fl=id,nid,title,comment_count,type
 ,created,changed,score,path,url,uid,namestart=0facet.sort=trueq=keyword1
 +keyword2+keyword3+keyword4bf=recip(rord(created),4,10704,10704)^200.0fac
 et.field=im_cck_field_authorfacet.field=typefacet.field=im_vid_1=indent=
 onstart=0version=2.2rows=10} hits=10704 status=0 QTime=1
 
 any hint on whether this is expected or not appreciated.
 
 Steph.
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Garbled facets even in a zero hit search

2010-09-09 Thread Markus Jelsma
That's normal behavior if you haven't configured facet.mincount. Check the 
wiki.

On Thursday 09 September 2010 16:05:01 Dennis Schafroth wrote:
 I am definitely not excluding the idea that index is garbled, but.. it
  doesn't explain that I get facets on zero hit.
 
 The schema is as following:
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



RE: Re: Inconsistent search results with multiple keywords

2010-09-09 Thread Markus Jelsma
Indeed, it's the dismax, i missed it! My bad.. 
 
-Original message-
From: Ahmet Arslan iori...@yahoo.com
Sent: Thu 09-09-2010 20:37
To: solr-user@lucene.apache.org; 
Subject: Re: Inconsistent search results with multiple keywords

 yes, my schema.xml file have  solrQueryParser
 defaultOperator=AND/ which
 is why I thought that the number of hits would decrease
 every time you add a
 keyword.

You are using dismax so, it is determined by mm parameter.

http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29


     


RE: roadmap

2010-09-09 Thread Markus Jelsma
You should check Jira's roadmap [1] instead. It shows a clear picture of what 
has been done since the 1.4.1 release and pending issues for the 3.x branch and 
others.

 

[1]: 
https://issues.apache.org/jira/browse/SOLR?report=com.atlassian.jira.plugin.system.project:roadmap-panel

 


 
-Original message-
From: Lukas Kahwe Smith m...@pooteeweet.org
Sent: Thu 09-09-2010 20:20
To: solr-user@lucene.apache.org; 
Subject: roadmap

Hi,

With the Lucene svn merge a lot of tentative release dates seemed to have 
slipped. Which is fine, because I think the merge is for the greater good of 
both projects in the long run.

However I do subscribe to the school of thought that believes OSS is best 
served with a release often mantra. Of course such a one time restructure can 
add a few months. So right now the main thing I feel a lot of people are 
wanting to hear is a tenative timeline for when to expect the next release and 
the key features that we can expect.

At least looking at http://lucene.apache.org/solr/ I do not see anything that 
communicates to the users where things are heading. Or am I just looking in the 
wrong place?

I hope I am not coming off as a whiney user, again I am not telling you guys to 
work harder without me handing you a pay check. I am just suggesting that a bit 
more transparency as to whats going to happen in the near future would make it 
all the more easier for us users to bet our futures on solr :)

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





RE: Re: Re: Invariants on a specific fq value

2010-09-09 Thread Markus Jelsma
It works as expected. The append, well, appends the parameter and because each 
collection has a unique value, specifying two filters on different collections 
will always yield zero results.

 

This, of course, won't work for values that are shared between collections.
 
-Original message-
From: Yonik Seeley yo...@lucidimagination.com
Sent: Wed 08-09-2010 19:38
To: solr-user@lucene.apache.org; 
Subject: Re: Re: Invariants on a specific fq value

2010 at 1:32 PM, Markus Jelsma markus.jel...@buyways.nl wrote:
 Interesting! I haven't met the appends method before and i'll be sure to give 
 it a try tomorrow. Try, the wiki [1] is not very clear on what it really does.

Here's a comment from the example solrconfig.xml:

   !-- In addition to defaults, appends params can be specified
        to identify values which should be appended to the list of
        multi-val params from the query (or the existing defaults).

        In this example, the param fq=instock:true will be appended to
        any query time fq params the user may specify, as a mechanism for
        partitioning the index, independent of any user selected filtering
        that may also be desired (perhaps as a result of faceted searching).

        NOTE: there is *absolutely* nothing a client can do to prevent these
        appends values from being used, so don't use this mechanism
        unless you are sure you always want it.
     --

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8


RE: Help on spelling.

2010-09-09 Thread Markus Jelsma
I don't see you passing spellcheck parameters in the query string. Are they 
configured as default in your search handler?
 
-Original message-
From: Gregg Hoshovsky hosho...@ohsu.edu
Sent: Thu 09-09-2010 22:40
To: solr-user@lucene.apache.org; 
Subject: Help on spelling.

I am trying to use the spellchecker but cannot get past the point of having the 
spelling possibilities returned.

I have a text field define in the schema.xml file as:

  field name=text type=text_ws indexed=true stored=false 
multiValued=true/

I modified solrconfig.xml to point the analyzer to the same field type and have 
the name set the same.

 searchComponent name=spellcheck class=solr.SpellCheckComponent

   str name=queryAnalyzerFieldTypetext_ws/str

   lst name=spellchecker
     str name=namedefault/str
     str name=fieldtext/str
     str name=spellcheckIndexDir./spellchecker/str
   /lst


I left the handler alone

 requestHandler name=/spell class=solr.SearchHandler lazy=true
   lst name=defaults

I see that the spellchecker folder gets files built so I am assuming that the 
spelling data is being created

Then I ran the query as
http://localhost:8983/solr/biolibrary/spell/?q=text:wedgversion=2.2start=0rows=10indent=onwt=json

I would expect that this would have returned some spelling suggestions ( such 
as wedge) but don t get anything besides:

{
responseHeader:{
 status:0,
 QTime:1},
response:{numFound:0,start:0,docs:[]
}}

Any help is appreciated.

Gregg



RE: How to Update Value of One Field of a Document in Index?

2010-09-10 Thread Markus Jelsma
The MoreLikeThis component actually can accept external input:

http://wiki.apache.org/solr/MoreLikeThisHandler#Using_ContentStreams
 
-Original message-
From: Jonathan Rochkind rochk...@jhu.edu
Sent: Fri 10-09-2010 18:59
To: solr-user@lucene.apache.org; 
Subject: RE: How to Update Value of One Field of a Document in Index?

More like this is intended to be run at query time. For what reasons are you 
thinking you want to (re-)index each document based on the results of 
MoreLikeThis?  You're right that that's not what the component is intended for. 

Jonathan

From: Savannah Beckett [savannah_becket...@yahoo.com]
Sent: Friday, September 10, 2010 11:18 AM
To: solr-user@lucene.apache.org
Subject: Re: How to Update Value of One Field of a Document in Index?

Thanks.  I am trying to use MoreLikeThis in Solr to find similar documents in
the solr index and use the data from these similar documents to modify a field
in each document that I am indexing.  I found that MoreLikeThis in Solr only
works when the document is in the index, is it true?  If so, I may have to wait
til the indexing is finished, then run my own command to do MoreLikeThis to each
document in the index, and then reindex each document?  It sounds like it's not
efficient.  Is there a better way?
Thanks.





From: Liam O'Boyle liam.obo...@intelligencebank.com
To: solr-user@lucene.apache.org
Cc: u...@nutch.apache.org
Sent: Thu, September 9, 2010 11:06:36 PM
Subject: Re: How to Update Value of One Field of a Document in Index?

Hi Savannah,

You can only reindex the entire document; if you only have the ID,
then do a search to retrieve the rest of the data, then reindex.  This
assumes that all of the fields you need to index are stored (so that
you can retrieve them) and not just indexed.

Liam

On Fri, Sep 10, 2010 at 3:29 PM, Savannah Beckett
savannah_becket...@yahoo.com wrote:

 I use nutch to crawl and index to Solr.  My code is working.  Now, I want to
 update the value of one of the fields of a document in the solr index after
the
 document was already indexed, and I have only the document id.  How do I do
 that?

 Thanks.








RE: multivalued fields in result

2010-09-11 Thread Markus Jelsma
Yes, you'll get what is stored and asked for. 
 
-Original message-
From: Jason Chaffee jchaf...@ebates.com
Sent: Sat 11-09-2010 05:27
To: solr-user@lucene.apache.org; 
Subject: multivalued fields in result

Is it possible to return multivalued files in the result?  

I would like to have a multivalued field that is stored and not indexed (I also 
copy the same field into another field where it is tokenized and indexed).  I 
would then like all the values of this field returned in the result set.  Is 
there a way to do this?

If it is not possible, could someone elaborate why that is so that I may see if 
I can make it work.

thanks,

Jason


RE: Re: solr.DateField: org.apache.solr.common.SolrException: Error while creating field

2010-09-14 Thread Markus Jelsma
It would be a nice feature if Solr supports queries with time zone support on 
an index where all times are UTC. There is some chatter about this in SOLR-750 
but i haven't found an issue that would add support for time zone queries.

 

Did i do a lousy search or is the issue missing as of yet?
 
-Original message-
From: Yonik Seeley yo...@lucidimagination.com
Sent: Tue 14-09-2010 22:58
To: solr-user@lucene.apache.org; 
Subject: Re: solr.DateField: org.apache.solr.common.SolrException: Error while 
creating field

On Tue, Sep 14, 2010 at 4:54 PM, h00kpub...@gmail.com
h00kpub...@googlemail.com wrote:
 SEVERE: org.apache.solr.common.SolrException: Error while creating field
 'metadata_last_modified{type=date,properties=indexed,stored,omitNorms}' from
 value '2010-09-14T22:29:24+0200'

Different timezones are currently not allowed - you must UTC (hence
the Z timecode).

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8


Re: Handling Aggregate Records/Roll-up in Solr

2010-09-16 Thread Markus Jelsma
You should  just flatten the representation of the shirt in the data model.


On Wednesday 15 September 2010 22:23:17 Thomas Martin wrote:
 Can someone point me to the mechanism in Sol that might allow me to
 roll-up or aggregate records for display.  We have many items that are
 similar and only want to show a representative record to the user until
 they select that record.
 
 
 
 As an example - We carry a polo shirt and have 15 records that represent
 the individual colors for that shirt.  Does the query API provide anyway
 to rollup the records passed on a property or do we need to just flatten
 the representation of the shirt in the data model.
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



RE: Re: Get all results from a solr query

2010-09-16 Thread Markus Jelsma
Not according to the wiki;

http://wiki.apache.org/solr/CommonQueryParameters#rows

 

But you could always create an issue for this one. 
 
-Original message-
From: Christopher Gross cogr...@gmail.com
Sent: Thu 16-09-2010 22:50
To: solr-user@lucene.apache.org; 
Subject: Re: Get all results from a solr query

That will stil just return 10 rows for me.  Is there something else in
the configuration of solr to have it return all the rows in the
results?

-- Chris



On Thu, Sep 16, 2010 at 4:43 PM, Shashi Kant sk...@sloan.mit.edu wrote:
 q=*:*

 On Thu, Sep 16, 2010 at 4:39 PM, Christopher Gross cogr...@gmail.com wrote:
 I have some queries that I'm running against a solr instance (older,
 1.2 I believe), and I would like to get *all* the results back (and
 not have to put an absurdly large number as a part of the rows
 parameter).

 Is there a way that I can do that?  Any help would be appreciated.

 -- Chris




Re: Search the mailinglist?

2010-09-17 Thread Markus Jelsma
http://www.lucidimagination.com/search/?q=


On Friday 17 September 2010 16:10:23 alexander sulz wrote:
   Im sry to bother you all with this, but is there a way to search through
 the mailinglist archive? Ive found
 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/ so far
 but there isnt any convinient way to search through the archive.
 
 Thanks for your help
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



RE: Grouping in solr ?

2010-09-23 Thread Markus Jelsma
http://wiki.apache.org/solr/FieldCollapsing

https://issues.apache.org/jira/browse/SOLR-236

 
-Original message-
From: Papp Richard ccode...@gmail.com
Sent: Thu 23-09-2010 21:29
To: solr-user@lucene.apache.org; 
Subject: Grouping in solr ?

Hi all,

 is it possible somehow to group documents?
 I have services as documents, and I would like to show the filtered
services grouped by company. 
 So I filter services by given criteria, but I show the results grouped by
companay.
 If I got 1000 services, maybe I need to show just 100 companies (this will
affect pagination as well), and how could I get the company info? Should I
store the company info in each service (I don't need the compnany info to be
indexed) ?

regards,
 Rich


__ Information from ESET NOD32 Antivirus, version of virus signature
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com




  1   2   3   4   5   6   7   8   9   10   >