Re: Retrieving a field from all result docuemnts couple of more queries

2009-09-16 Thread abhay kumar
Hi,

1)Solr has various type of caches . We can specify how many documents cache
can have at a time.
   e.g. if windowsize=50
   50 results will be cached in queryResult Cache.
if user makes a new request to server for results after 50
documents a new request will be sent to the server  server will retrieve
next 50 results in the cache.
   http://wiki.apache.org/solr/SolrCaching
   Yes, solr looks into the cache to retrieve the fields to be returned.

2) Yes, we can have different tokenizers or filters for index  search. We
need not create a different fieldtype. We need to configure the same
fieldtype (datatype) for index  search analyzers sections differently.

   e.g.

fieldType name=textSpell class=solr.TextField
positionIncrementGap=100 stored=false multiValued=true
  *analyzer type=index*
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/

 !--filter class=solr.SynonymFilterFactory
synonyms=Synonyms.txt ignoreCase=true expand=false/--
 filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
 filter class=solr.StandardFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
  * analyzer type=query*
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/

 filter class=solr.StandardFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType



Regards,
Abhay

On Tue, Sep 15, 2009 at 6:41 PM, Shashikant Kore shashik...@gmail.comwrote:

 Hi,

 I am familiar with Lucene and trying out Solr.

 I have index which was created outside solr. The index is fairly
 simple with two field - document_id   content. The query result needs
 to return all the document IDs. The result need not be ordered by the
 score. For this, in Lucene, I use custom hit collector with search to
 get results quickly. The index has a few million documents and queries
 returning hundreds of thousands of documents are not uncommon. So, the
 speed is crucial here.

 Since retrieving the document_id for each document is slow, I am using
 FileldCache to store the values of document_id. For all the results
 collected (in a bitset) with hit collector, document_id field is
 retrieved from the fieldcache.

 1. How can I effectively disable scoring? I have read that
 ConstantScoreQuery is quite fast, but from the code, I see that it is
 used only for wildcard queries. How can I use ConstantScoreQuery for
 all the queries (boolean, term, phrase, ..)?  Also, is
 ConstantScoreQuery as fast as a custom hit collector?

 2. How can Solr take advantage of the fieldcache while returning the
 field document_id? The documentation says, fieldcache can be
 explicitly auto warmed with Solr.  If fieldcache is available and
 initialized at the beginning, will solr look into the cache to
 retrieve the fields to be returned?

 3. If there is an additional field for stemmed_content on which search
 needs to use different analyzer, I suppose, that could be specified by
 fieldType attribute in the schema.

 Thank you,

 --shashi



Re: How to create a new index file automatically

2009-09-16 Thread busbus



 It can import documents in certain other formats using the 
 http://wiki.apache.org/solr/ExtractingRequestHandler
 

1) According to my inference.Solr uses Apache Tikka to convert other rich
document format files to Text Files, so that the Class ExtractRequestHandler
use the output text file to create the Index files.

2. If Point 1 is correct,then I think this could suit my requirements since
I need to index rich documents files especially .xls format.
But i cant find the class ExtractRequestHandler which has to be configured
in SOLRCONFIG.xml file, so that i can import XLS documents through the
servlet

ttp://localhost:8983/solr/update/extract?=
-- 
View this message in context: 
http://www.nabble.com/How-to-create-a-new-index-file-automatically-tp25455045p25466714.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr exception with missing required field (meta_guid_s)

2009-09-16 Thread Shalin Shekhar Mangar
On Wed, Sep 16, 2009 at 1:13 AM, kedardes kedar.w...@gmail.com wrote:


 Hi, I have a data-config file where I map the fields of a very simple table
 using dynamic field definitions :

document name=names
entity name=names query=select * from test
field column=id name=id_i /
field column=name name=name_s /
field column=city name=city_s /
/entity
/document

 but when I run the dataimport I get this error:
 WARNING: Error creating document : SolrInputDocumnt[{id_i=id_i(1.0)={2},
 name_s=name_s(1.0)={John Smith}, city_s=city_s(1.0)={Newark}}]
 org.apache.solr.common.SolrException: Document [null] missing required
 field: meta_guid_s

 From the schema.xml I see that the meta_guid_s field is defined as a
 Global
 unique ID but does this have to be set explicitly or mapped to a
 particular
 field?


You have created that schema so you are the better person to answer that
question. As far as a required field or uniqueKey is concerned, their values
have to be set or copied from another field.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Questions on copyField

2009-09-16 Thread Rahul R
Would appreciate any help on this. Thanks

Rahul
On Mon, Sep 14, 2009 at 5:12 PM, Rahul R rahul.s...@gmail.com wrote:

 Hello,
 I have a few questions regarding the copyField directive in schema.xml

 1. Does the destination field store a reference or the actual data ?
 If I have soemthing like this
 copyField source=name dest=text/
 then will the values in the 'name' field get copied into the 'text' field
 or will the 'text' field only store a reference to the 'name' field ? To put
 it more simply, if I later delete the 'name' field from the index will I
 lose the corresponding data in the 'text' field ?

 2. Is there any inbuilt API which I can use to do the copyField action
 programmatically ?

 3. Can I do a copyfield from the schema as well as programmatically for the
 same destination field
 Suppose I want the 'text' field to contain values for name, age and
 location. In my index only 'name' and 'age' are defined as fields. So I can
 add directives like
 copyField source=name dest=text/
 copyField source=age dest=text/
 The location however, I want to add it to the 'text' field
 programmatically. I don't want to store the location as a separate field in
 the index. Can I do this ?

 Thank you.

 Regards
 Rahul



Re: Solr results filtered on MoreLikeThis

2009-09-16 Thread Marcelk

Hi All,

Should I create plugin for this or is there some functionality in solr that
can help me.

I basically already have part of what I want. The search response gives me a
result list with (in my situation) 20 results and the attached morelikethis
NamedList. Filtering based on the morelikethis 'duplicates' may result in 12
results, meaning my result list is not complete any more, since I requested
20 results. So now I need to do a new search, which I need to filter yet
again. And so on and so forth untill I get a result of 20. This is not a
very robust implementation.

Can I do something like this on the solr side (via plugin)? For instance
filter on the lucene hits based on the morelikethis or something like that.
So that I can return exactly 20 results. Also adding the morelikethis to the
response. Grouping based on the morelikethis whould even be a nice to have
using the collapsing field functionality once it is fully implemented in
solr. 

I hope someone can give me some pointers in the right direction. 

Kind Regards,
Marcel

-- 
View this message in context: 
http://www.nabble.com/Solr-results-filtered-on-MoreLikeThis-tp25434881p25467907.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Questions on copyField

2009-09-16 Thread Shalin Shekhar Mangar
On Mon, Sep 14, 2009 at 5:12 PM, Rahul R rahul.s...@gmail.com wrote:

 Hello,
 I have a few questions regarding the copyField directive in schema.xml

 1. Does the destination field store a reference or the actual data ?


It makes a copy. Storing or indexing of the field depends on the field
configuration.


 If I have soemthing like this
 copyField source=name dest=text/
 then will the values in the 'name' field get copied into the 'text' field
 or
 will the 'text' field only store a reference to the 'name' field ? To put
 it
 more simply, if I later delete the 'name' field from the index will I lose
 the corresponding data in the 'text' field ?


The values will get copied. If you delete all values from the 'name' field
from the index, the data in text field remain as-is.



 2. Is there any inbuilt API which I can use to do the copyField action
 programmatically ?


No. But you can always copy explicitly before sending or you can use a
custom UpdateRequestProcessor to copy values from one field to another
during indexing.


 3. Can I do a copyfield from the schema as well as programmatically for the
 same destination field
 Suppose I want the 'text' field to contain values for name, age and
 location. In my index only 'name' and 'age' are defined as fields. So I can
 add directives like
 copyField source=name dest=text/
 copyField source=age dest=text/
 The location however, I want to add it to the 'text' field
 programmatically.
 I don't want to store the location as a separate field in the index. Can I
 do this ?


You can send the location's value directly as the value of the text field.
Also note, that you don't really need to index/store the source field. You
can make the location field's type as ignored in the schema.

-- 
Regards,
Shalin Shekhar Mangar.


Need help to finalize my autocomplete

2009-09-16 Thread Vincent Pérès

Hello,

I'm using the following code for my autocomplete feature :

The field type :

fieldType name=autoComplete class=solr.TextField omitNorms=true
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.EdgeNGramFilterFactory maxGramSize=20
minGramSize=2 /
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory
pattern=^(.{20})(.*)? replacement=$1 replace=all /
  /analyzer
/fieldType

The field :

dynamicField name=*_ac type=autoComplete indexed=true stored=true
/

The query :
?q=*:*fq=query_ac:harry*wt=jsonrows=15start=0fl=*indent=onfq=model:SearchQuery

It gives me a list of results I can parse and use with jQuery autocomplete
plugin and all that works very well.

Example of results :
 harry
 harry potter
 the last fighting harry
 harry potter 5
 comic relief harry potter

What I would like to do now is only to have results starting with the query,
so it should be :
 harry
 harry potter
 harry potter 5

Can anybody tell me if it is possible and so how to do it ?

Thank you !
Vincent
And 
-- 
View this message in context: 
http://www.nabble.com/Need-help-to-finalize-my-autocomplete-tp25468885p25468885.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Need help to finalize my autocomplete

2009-09-16 Thread Avlesh Singh
Instead of tokenizer class=solr.WhitespaceTokenizerFactory / use
tokenizer class=solr.KeywordTokenizerFactory/

Cheers
Avlesh

2009/9/16 Vincent Pérès vincent.pe...@gmail.com


 Hello,

 I'm using the following code for my autocomplete feature :

 The field type :

fieldType name=autoComplete class=solr.TextField omitNorms=true
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.EdgeNGramFilterFactory maxGramSize=20
 minGramSize=2 /
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory
 pattern=^(.{20})(.*)? replacement=$1 replace=all /
  /analyzer
/fieldType

 The field :

 dynamicField name=*_ac type=autoComplete indexed=true stored=true
 /

 The query :

 ?q=*:*fq=query_ac:harry*wt=jsonrows=15start=0fl=*indent=onfq=model:SearchQuery

 It gives me a list of results I can parse and use with jQuery autocomplete
 plugin and all that works very well.

 Example of results :
  harry
  harry potter
  the last fighting harry
  harry potter 5
  comic relief harry potter

 What I would like to do now is only to have results starting with the
 query,
 so it should be :
  harry
  harry potter
  harry potter 5

 Can anybody tell me if it is possible and so how to do it ?

 Thank you !
 Vincent
 And
 --
 View this message in context:
 http://www.nabble.com/Need-help-to-finalize-my-autocomplete-tp25468885p25468885.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Need help to finalize my autocomplete

2009-09-16 Thread Vincent Pérès

Hello,

I tried to replace the class as you suggested, but I still get the same
result (and not results where the query start only with the giving query).

fieldType name=autoComplete class=solr.TextField omitNorms=true
  analyzer
tokenizer class=solr.KeywordTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.EdgeNGramFilterFactory maxGramSize=20
minGramSize=2 /
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory
pattern=^(.{20})(.*)? replacement=$1 replace=all /
  /analyzer
/fieldType
-- 
View this message in context: 
http://www.nabble.com/Need-help-to-finalize-my-autocomplete-tp25468885p25469239.html
Sent from the Solr - User mailing list archive at Nabble.com.



Mapping SolrDoc to SolrInputDoc

2009-09-16 Thread Licinio Fernández Maurelo
Hi there,

currently i'm working on a small app which creates an Embedded Solr Server,
reads all documents from one core and puts these docs into another one.

The purpose of this app is to apply (small) changes on schema.xml to indexed
data (offline) resulting a new index with documents updated to schema.xml
changes.

What i want to know is if there is an easy way to map SolrDoc  to
SolrInputDoc.

Any help would be much appreciated

-- 
Lici


Re: Need help to finalize my autocomplete

2009-09-16 Thread Shalin Shekhar Mangar
2009/9/16 Vincent Pérès vincent.pe...@gmail.com


 Hello,

 I tried to replace the class as you suggested, but I still get the same
 result (and not results where the query start only with the giving query).


Make sure you re-index your documents after change the schema.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Need help to finalize my autocomplete

2009-09-16 Thread Vincent Pérès

After re-indexing it works very well ! Thanks a lot !

Vincent
-- 
View this message in context: 
http://www.nabble.com/Need-help-to-finalize-my-autocomplete-tp25468885p25469931.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Mapping SolrDoc to SolrInputDoc

2009-09-16 Thread Martijn v Groningen
Hi Licinio,

You can use ClientUtils.toSolrInputDocument(...), that converts a
SolrDocument to a SolrInputDocument.

Martijn

2009/9/16 Licinio Fernández Maurelo licinio.fernan...@gmail.com:
 Hi there,

 currently i'm working on a small app which creates an Embedded Solr Server,
 reads all documents from one core and puts these docs into another one.

 The purpose of this app is to apply (small) changes on schema.xml to indexed
 data (offline) resulting a new index with documents updated to schema.xml
 changes.

 What i want to know is if there is an easy way to map SolrDoc  to
 SolrInputDoc.

 Any help would be much appreciated

 --
 Lici




-- 
Met vriendelijke groet,

Martijn van Groningen


Re: Mapping SolrDoc to SolrInputDoc

2009-09-16 Thread Licinio Fernández Maurelo
I'll try, thanks Martijn

2009/9/16 Martijn v Groningen martijn.is.h...@gmail.com

 Hi Licinio,

 You can use ClientUtils.toSolrInputDocument(...), that converts a
 SolrDocument to a SolrInputDocument.

 Martijn

 2009/9/16 Licinio Fernández Maurelo licinio.fernan...@gmail.com:
  Hi there,
 
  currently i'm working on a small app which creates an Embedded Solr
 Server,
  reads all documents from one core and puts these docs into another one.
 
  The purpose of this app is to apply (small) changes on schema.xml to
 indexed
  data (offline) resulting a new index with documents updated to schema.xml
  changes.
 
  What i want to know is if there is an easy way to map SolrDoc  to
  SolrInputDoc.
 
  Any help would be much appreciated
 
  --
  Lici
 



 --
 Met vriendelijke groet,

 Martijn van Groningen




-- 
Lici


Re: Solr results filtered on MoreLikeThis

2009-09-16 Thread Chantal Ackermann
Have you had a look at the facet query? Not sure but it might just do 
what you are looking for.


http://wiki.apache.org/solr/SolrFacetingOverview
http://wiki.apache.org/solr/SimpleFacetParameters




Hi All,

Should I create plugin for this or is there some functionality in solr that
can help me.

I basically already have part of what I want. The search response gives me a
result list with (in my situation) 20 results and the attached morelikethis
NamedList. Filtering based on the morelikethis 'duplicates' may result in 12
results, meaning my result list is not complete any more, since I requested
20 results. So now I need to do a new search, which I need to filter yet
again. And so on and so forth untill I get a result of 20. This is not a
very robust implementation.

Can I do something like this on the solr side (via plugin)? For instance
filter on the lucene hits based on the morelikethis or something like that.
So that I can return exactly 20 results. Also adding the morelikethis to the
response. Grouping based on the morelikethis whould even be a nice to have
using the collapsing field functionality once it is fully implemented in
solr.

I hope someone can give me some pointers in the right direction.

Kind Regards,
Marcel

--
View this message in context: 
http://www.nabble.com/Solr-results-filtered-on-MoreLikeThis-tp25434881p25467907.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Solr results filtered on MoreLikeThis

2009-09-16 Thread Marcelk

Hi Chantal,


Chantal Ackermann wrote:
 
 Have you had a look at the facet query? Not sure but it might just do 
 what you are looking for.
 
 http://wiki.apache.org/solr/SolrFacetingOverview
 http://wiki.apache.org/solr/SimpleFacetParameters
 

I still don't really understand facetting? But It might help me using
following trick.

When I index a document I check for morelikethis. Then each morelikethis and
the indexed element itself will get the references to each other via a
relatedIds array field. Then (maybe using facetting) I will filter the
result based on the id on its own relatedIds. I don't yet know how to do
that, but perhaps you understand how this could be done?

Example:

document1
   - id = 1
   - relatedIds = [2,3,4,5]
   - content = 'some cool java job'
document2
   - id = 2
   - relatedIds = [1,3,4,5]
   - content = 'another cool java job'
document3
   - id = 3
   - relatedIds = [1,2,4,5]
   - content = 'yet another cool java job'
etc...
document6
   - id = 6
   - relatedIds = []
   - content = 'this java article is for you';
document7
   - id=7
   - relatedIds = [8]
   - content = 'nice java book'
document8
   - id=8
   - relatedIds = [7]
   - content = 'java book looks nice'

Now when I search, I would like to have following results:

- document1 (4 related documents)
- document6
- document7 (1 related document)

Could you give me an example on how I could get that result, maybe using
facets?

Kind Regards,
Marcel
-- 
View this message in context: 
http://www.nabble.com/Solr-results-filtered-on-MoreLikeThis-tp25434881p25470762.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Retrieving a field from all result docuemnts couple of more queries

2009-09-16 Thread Shashikant Kore
Thanks, Abhay.

Can someone please throw light on how to disable scoring?

--shashi

On Wed, Sep 16, 2009 at 11:55 AM, abhay kumar abhay...@gmail.com wrote:
 Hi,

 1)Solr has various type of caches . We can specify how many documents cache
 can have at a time.
       e.g. if windowsize=50
           50 results will be cached in queryResult Cache.
            if user makes a new request to server for results after 50
 documents a new request will be sent to the server  server will retrieve
 next             50 results in the cache.
       http://wiki.apache.org/solr/SolrCaching
       Yes, solr looks into the cache to retrieve the fields to be returned.

 2) Yes, we can have different tokenizers or filters for index  search. We
 need not create a different fieldtype. We need to configure the same
 fieldtype (datatype) for index  search analyzers sections differently.

   e.g.

        fieldType name=textSpell class=solr.TextField
 positionIncrementGap=100 stored=false multiValued=true
          *analyzer type=index*
         tokenizer class=solr.StandardTokenizerFactory/
         filter class=solr.LowerCaseFilterFactory/

         !--filter class=solr.SynonymFilterFactory
 synonyms=Synonyms.txt ignoreCase=true expand=false/--
         filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
         filter class=solr.StandardFilterFactory/
         filter class=solr.RemoveDuplicatesTokenFilterFactory/
       /analyzer
      * analyzer type=query*
         tokenizer class=solr.StandardTokenizerFactory/
         filter class=solr.LowerCaseFilterFactory/

         filter class=solr.StandardFilterFactory/
         filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
    /fieldType



 Regards,
 Abhay

 On Tue, Sep 15, 2009 at 6:41 PM, Shashikant Kore shashik...@gmail.comwrote:

 Hi,

 I am familiar with Lucene and trying out Solr.

 I have index which was created outside solr. The index is fairly
 simple with two field - document_id   content. The query result needs
 to return all the document IDs. The result need not be ordered by the
 score. For this, in Lucene, I use custom hit collector with search to
 get results quickly. The index has a few million documents and queries
 returning hundreds of thousands of documents are not uncommon. So, the
 speed is crucial here.

 Since retrieving the document_id for each document is slow, I am using
 FileldCache to store the values of document_id. For all the results
 collected (in a bitset) with hit collector, document_id field is
 retrieved from the fieldcache.

 1. How can I effectively disable scoring? I have read that
 ConstantScoreQuery is quite fast, but from the code, I see that it is
 used only for wildcard queries. How can I use ConstantScoreQuery for
 all the queries (boolean, term, phrase, ..)?  Also, is
 ConstantScoreQuery as fast as a custom hit collector?

 2. How can Solr take advantage of the fieldcache while returning the
 field document_id? The documentation says, fieldcache can be
 explicitly auto warmed with Solr.  If fieldcache is available and
 initialized at the beginning, will solr look into the cache to
 retrieve the fields to be returned?

 3. If there is an additional field for stemmed_content on which search
 needs to use different analyzer, I suppose, that could be specified by
 fieldType attribute in the schema.

 Thank you,

 --shashi




Re: Extract info from parent node during data import (redirect:)

2009-09-16 Thread Paul, Noble
Fergus,

Implementing  wildcard (//tagname) is definitely possible. I would love
to see it working. But if you wish to take a dig at it I shall do
whatever I can to help.

What is the use case that makes flow though so useful? 
We do not know to which forEach xpath a given field is associated with.
Currently you can clean up the fields using a transformer. There is an
implicit field '$forEach' which tells you about the xpath tag for each
record that is emitted.

The recently added comments in XPathRecordReader are a great help and I
was planning to add more. Might this be an issue?
I would love to have it. Give a patch and I shall commit it.
XPathRecordReader is a blackbox and AFAIK I am the only one who knows
it. I would love to have more eyes on that.

I would like to open a JIRA for improving XPathRecordReader.
Please go ahead. You can paste the contents of this mail in the list .
There may be others with similar ideas

Noble.
-Original Message-
Noble

/document/category/item | /document/category

means there are two paths which triggers a new doc (it is possible to
have more). Whenever it encounters the closing tag of that xpath , it
emits all the fields it collected since the opening of the same tag.
after that it clears all the fields it collected since the opening of
the tag.

If there are fields it collected before opening of the same tag, it 
retains it


 Nice and clear, but that is not what I see.

 With my test case with forEach=/record | /record/mediaBlock
 I see that for each /record/mediaBlock document indexed it contains

 all fields from the parent /record document as well. A search over 
 mediaBlock s returns lots of extra fields from the parent which did 
 not have the commonField attribute. I will try and produce a testcase

yes it does . . /record/mediaBlock will have all the fields collected 
from /record as well.  *It is by design**

Oh!

I had always considered it a bug or at least a limitation. After all if
we have the commonField attribute why do we need an automatic flow
through of all collected fields from parent nodes. This feature is as
far as I can see undocumented and at the same time unintuitive.
It also, in my case, causes tons more information to be indexed than is
needed.

I have spent a while thinking through possible use cases. My use case
involves having documents we want to search as a whole and behave as
normal. At the same time these documents contain inner sections we wish
to treat as sub-documents; in my case I a have pictures with associated
captions which I wish to search separately. Having indexed the documents
with forEach=/record | /record/mediaBlock my picture search works
nicely but I have a nasty side effect when performing searches over the
rest of the document. Because fields from the parent node are also
present in the children, when I search for any text the same document
gets returned many times, once due to the text in the parent node and
again for each picture placed in the document. I have a work around for
this issue but have always considered it a bug.

What is the use case that makes flow though so useful?

I had just started playing with the code to see how easy this would be
to change. The recently added comments in XPathRecordReader are a great
help and I was planning to add more. Might this be an issue?

I have noted, while lurking on the solr mail lists, that requests for
this type of functionality keep coming up; to be able to restrict
searches to a sub section of a document. I have really needed this sort
of thinks many times with the type of stuff I work with.

My other planned activity was to see how easy xpaths such as //tagname
would be implement. Since my latest data-config.xml looks like:-

field column=para32   name=text xpath=/record/address/para
flatten=true /
field column=para40   name=text xpath=/record/authoredBy/para
flatten=true /
field column=para43   name=text
xpath=/record/dataGroup/address/para  flatten=true /
field column=para47   name=text
xpath=/record/dataGroup/keyPersonnel/doubleList/first/para
flatten=true /
field column=para49   name=text
xpath=/record/dataGroup/keyPersonnel/doubleList/second/para
flatten=true /
field column=para50   name=text
xpath=/record/dataGroup/keyPersonnel/para  flatten=true /
field column=para51   name=text xpath=/record/dataGroup/para
flatten=true /
field column=para57   name=text
xpath=/record/doubleList/first/para  flatten=true /
field column=para59   name=text
xpath=/record/doubleList/second/para  flatten=true /
field column=para63   name=text
xpath=/record/keyPersonnel/doubleList/first/para  flatten=true /
field column=para65   name=text
xpath=/record/keyPersonnel/doubleList/second/para  flatten=true /
field column=para68   name=text xpath=/record/list/listItem/para
flatten=true /
field column=para75   name=text
xpath=/record/mediaBlock/doubleList/first/para  flatten=true /
field column=para77   name=text
xpath=/record/mediaBlock/doubleList/second/para  flatten=true 

Re: multicore shards and relevancy score

2009-09-16 Thread Shalin Shekhar Mangar
On Tue, Sep 15, 2009 at 8:11 PM, Paul Rosen p...@performantsoftware.comwrote:


 The second issue was detailed in an email last week shards and facet
 count. The facet information is lost when doing a search over two shards,
 so if I use multicore, I can no longer have facets.


If both cores' schema is same and a uniqueKey is specified, then you can do
a distributed search between two cores. Facets work fine with distributed
search. There may be something wrong with your setup.

-- 
Regards,
Shalin Shekhar Mangar.


DeltaImport problem

2009-09-16 Thread KirstyS

I hope this is the correct place to post this issue and if so, that someone
can help. 
I am using the DIH with Solr 1.3
My data-config.xml file looks like this:
dataSource
driver=net.sourceforge.jtds.jdbc.Driver
url=jdbc:jtds:sqlserver:{taken out for posting}
user={taken out for posting} 
password={taken out for posting} /

  entity name=article pk=CmsArticleId
query=Select a.CmsArticleId, a.CreatorId , LastUpdatedBy,
LastUpdateDate, Title, Synopsis, Author, Source, IsPublished,
ArticleTypeId,
  a.StrapHead, ShortHeading, HomePageBlurb, ByLine,
ArticleStatusId, LiveEdit, a.OriginalCategoryId, aac.Rank,
  c.CategoryId, AncestralName, CategoryName,
CategoryDisplayName, ParentCategoryId, c.SiteId
   from Category c (nolock)
inner join CmsArticleCollection ac (nolock) on
c.DefaultCmsArticleCollectionId = ac.CmsArticleCollectionId
inner join CmsArticleArticleCollection aac (nolock)
on ac.CmsArticleCollectionId = aac.CmsArticleCollectionId
inner join CmsArticle a (nolock) on aac.CmsArticleId
= a.CmsArticleId
  where (a.LiveEdit is null or a.LiveEdit = 0)
 and aac.SourceCmsArticleArticleCollectionId is
null
deltaQuery=Select a.CmsArticleId, a.CreatorId , LastUpdatedBy,
LastUpdateDate, Title, Synopsis, Author, Source, IsPublished, ArticleTypeId,
  a.StrapHead, ShortHeading, HomePageBlurb, ByLine,
ArticleStatusId, LiveEdit, a.OriginalCategoryId, aac.Rank,
  c.CategoryId, AncestralName, CategoryName,
CategoryDisplayName, ParentCategoryId, c.SiteId
   from Category c (nolock)
inner join CmsArticleCollection ac (nolock) on
c.DefaultCmsArticleCollectionId = ac.CmsArticleCollectionId
inner join CmsArticleArticleCollection aac (nolock)
on ac.CmsArticleCollectionId = aac.CmsArticleCollectionId
inner join CmsArticle a (nolock) on aac.CmsArticleId
= a.CmsArticleId
  where (a.LiveEdit is null or a.LiveEdit = 0)
 and aac.SourceCmsArticleArticleCollectionId is null
 and (LastUpdateDate  
'${dataimporter.last_index_time}'
OR a.CreationDate  '${dataimporter.last_index_time}') 

Have tried casting the dataimporter.last_index_time and the other date
fields. To no avail. My Full Import works perfectly but I cannot get the
command=delta-import to pick up the updated records. The LastUpdateDate is
being updated. When I run this in the debug interface with delta-import it
just never calls the delta import. 
Please, if anyone knows what I am doing wrong???
Many thanks
Kirsty
-- 
View this message in context: 
http://www.nabble.com/DeltaImport-problem-tp25471596p25471596.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DeltaImport problem

2009-09-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
I vaguely remember there was an issue with delta-import in 1.3. could
you try it out with Solr1.4

On Wed, Sep 16, 2009 at 6:14 PM, KirstyS kirst...@gmail.com wrote:

 I hope this is the correct place to post this issue and if so, that someone
 can help.
 I am using the DIH with Solr 1.3
 My data-config.xml file looks like this:
 dataSource
        driver=net.sourceforge.jtds.jdbc.Driver
                    url=jdbc:jtds:sqlserver:{taken out for posting}
        user={taken out for posting}
        password={taken out for posting} /

  entity name=article pk=CmsArticleId
            query=Select a.CmsArticleId, a.CreatorId , LastUpdatedBy,
 LastUpdateDate, Title, Synopsis,     Author, Source, IsPublished,
 ArticleTypeId,
                          a.StrapHead, ShortHeading, HomePageBlurb, ByLine,
 ArticleStatusId, LiveEdit, a.OriginalCategoryId, aac.Rank,
                          c.CategoryId, AncestralName, CategoryName,
 CategoryDisplayName, ParentCategoryId, c.SiteId
                   from Category c (nolock)
                        inner join CmsArticleCollection ac (nolock) on
 c.DefaultCmsArticleCollectionId = ac.CmsArticleCollectionId
                        inner join CmsArticleArticleCollection aac (nolock)
 on ac.CmsArticleCollectionId = aac.CmsArticleCollectionId
                        inner join CmsArticle a (nolock) on aac.CmsArticleId
 = a.CmsArticleId
                  where (a.LiveEdit is null or a.LiveEdit = 0)
                         and aac.SourceCmsArticleArticleCollectionId is
 null
            deltaQuery=Select a.CmsArticleId, a.CreatorId , LastUpdatedBy,
 LastUpdateDate, Title, Synopsis, Author, Source, IsPublished, ArticleTypeId,
                          a.StrapHead, ShortHeading, HomePageBlurb, ByLine,
 ArticleStatusId, LiveEdit, a.OriginalCategoryId, aac.Rank,
                          c.CategoryId, AncestralName, CategoryName,
 CategoryDisplayName, ParentCategoryId, c.SiteId
                   from Category c (nolock)
                        inner join CmsArticleCollection ac (nolock) on
 c.DefaultCmsArticleCollectionId = ac.CmsArticleCollectionId
                        inner join CmsArticleArticleCollection aac (nolock)
 on ac.CmsArticleCollectionId = aac.CmsArticleCollectionId
                        inner join CmsArticle a (nolock) on aac.CmsArticleId
 = a.CmsArticleId
                  where (a.LiveEdit is null or a.LiveEdit = 0)
                         and aac.SourceCmsArticleArticleCollectionId is null
                                                 and (LastUpdateDate  
 '${dataimporter.last_index_time}'
 OR a.CreationDate  '${dataimporter.last_index_time}') 

 Have tried casting the dataimporter.last_index_time and the other date
 fields. To no avail. My Full Import works perfectly but I cannot get the
 command=delta-import to pick up the updated records. The LastUpdateDate is
 being updated. When I run this in the debug interface with delta-import it
 just never calls the delta import.
 Please, if anyone knows what I am doing wrong???
 Many thanks
 Kirsty
 --
 View this message in context: 
 http://www.nabble.com/DeltaImport-problem-tp25471596p25471596.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: DeltaImport problem

2009-09-16 Thread KirstyS

I thought 1.4 was not released yet? 


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 I vaguely remember there was an issue with delta-import in 1.3. could
 you try it out with Solr1.4
 
 On Wed, Sep 16, 2009 at 6:14 PM, KirstyS kirst...@gmail.com wrote:

 I hope this is the correct place to post this issue and if so, that
 someone
 can help.
 I am using the DIH with Solr 1.3
 My data-config.xml file looks like this:
 dataSource
        driver=net.sourceforge.jtds.jdbc.Driver
                    url=jdbc:jtds:sqlserver:{taken out for posting}
        user={taken out for posting}
        password={taken out for posting} /

  entity name=article pk=CmsArticleId
            query=Select a.CmsArticleId, a.CreatorId , LastUpdatedBy,
 LastUpdateDate, Title, Synopsis,     Author, Source, IsPublished,
 ArticleTypeId,
                          a.StrapHead, ShortHeading, HomePageBlurb,
 ByLine,
 ArticleStatusId, LiveEdit, a.OriginalCategoryId, aac.Rank,
                          c.CategoryId, AncestralName, CategoryName,
 CategoryDisplayName, ParentCategoryId, c.SiteId
                   from Category c (nolock)
                        inner join CmsArticleCollection ac (nolock) on
 c.DefaultCmsArticleCollectionId = ac.CmsArticleCollectionId
                        inner join CmsArticleArticleCollection aac
 (nolock)
 on ac.CmsArticleCollectionId = aac.CmsArticleCollectionId
                        inner join CmsArticle a (nolock) on
 aac.CmsArticleId
 = a.CmsArticleId
                  where (a.LiveEdit is null or a.LiveEdit = 0)
                         and aac.SourceCmsArticleArticleCollectionId is
 null
            deltaQuery=Select a.CmsArticleId, a.CreatorId ,
 LastUpdatedBy,
 LastUpdateDate, Title, Synopsis, Author, Source, IsPublished,
 ArticleTypeId,
                          a.StrapHead, ShortHeading, HomePageBlurb,
 ByLine,
 ArticleStatusId, LiveEdit, a.OriginalCategoryId, aac.Rank,
                          c.CategoryId, AncestralName, CategoryName,
 CategoryDisplayName, ParentCategoryId, c.SiteId
                   from Category c (nolock)
                        inner join CmsArticleCollection ac (nolock) on
 c.DefaultCmsArticleCollectionId = ac.CmsArticleCollectionId
                        inner join CmsArticleArticleCollection aac
 (nolock)
 on ac.CmsArticleCollectionId = aac.CmsArticleCollectionId
                        inner join CmsArticle a (nolock) on
 aac.CmsArticleId
 = a.CmsArticleId
                  where (a.LiveEdit is null or a.LiveEdit = 0)
                         and aac.SourceCmsArticleArticleCollectionId is
 null
                                                 and (LastUpdateDate 
 '${dataimporter.last_index_time}'
 OR a.CreationDate  '${dataimporter.last_index_time}') 

 Have tried casting the dataimporter.last_index_time and the other date
 fields. To no avail. My Full Import works perfectly but I cannot get the
 command=delta-import to pick up the updated records. The LastUpdateDate
 is
 being updated. When I run this in the debug interface with delta-import
 it
 just never calls the delta import.
 Please, if anyone knows what I am doing wrong???
 Many thanks
 Kirsty
 --
 View this message in context:
 http://www.nabble.com/DeltaImport-problem-tp25471596p25471596.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com
 
 

-- 
View this message in context: 
http://www.nabble.com/DeltaImport-problem-tp25471596p25471927.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Retrieving a field from all result docuemnts couple of more queries

2009-09-16 Thread rajan chandi
You might be talking about modifying the similarity object to modify scoring
formula in Lucene!

  $searcher-setSimilarity($similarity);
  $writer-setSimilarity($similarity);


This can very well be done in Solr as SolrIndexWriter inherits from Lucene
IndexWriter class.
You might want to download the Solr Source code and take a look at the
SolrIndexWriter to begin with!

It's in the package - org.apache.solr.update

Thanks
Rajan

On Wed, Sep 16, 2009 at 5:42 PM, Shashikant Kore shashik...@gmail.comwrote:

 Thanks, Abhay.

 Can someone please throw light on how to disable scoring?

 --shashi

 On Wed, Sep 16, 2009 at 11:55 AM, abhay kumar abhay...@gmail.com wrote:
  Hi,
 
  1)Solr has various type of caches . We can specify how many documents
 cache
  can have at a time.
e.g. if windowsize=50
50 results will be cached in queryResult Cache.
 if user makes a new request to server for results after 50
  documents a new request will be sent to the server  server will retrieve
  next 50 results in the cache.
http://wiki.apache.org/solr/SolrCaching
Yes, solr looks into the cache to retrieve the fields to be
 returned.
 
  2) Yes, we can have different tokenizers or filters for index  search.
 We
  need not create a different fieldtype. We need to configure the same
  fieldtype (datatype) for index  search analyzers sections differently.
 
e.g.
 
 fieldType name=textSpell class=solr.TextField
  positionIncrementGap=100 stored=false multiValued=true
   *analyzer type=index*
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
 
  !--filter class=solr.SynonymFilterFactory
  synonyms=Synonyms.txt ignoreCase=true expand=false/--
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt/
  filter class=solr.StandardFilterFactory/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
   * analyzer type=query*
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
 
  filter class=solr.StandardFilterFactory/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType
 
 
 
  Regards,
  Abhay
 
  On Tue, Sep 15, 2009 at 6:41 PM, Shashikant Kore shashik...@gmail.com
 wrote:
 
  Hi,
 
  I am familiar with Lucene and trying out Solr.
 
  I have index which was created outside solr. The index is fairly
  simple with two field - document_id   content. The query result needs
  to return all the document IDs. The result need not be ordered by the
  score. For this, in Lucene, I use custom hit collector with search to
  get results quickly. The index has a few million documents and queries
  returning hundreds of thousands of documents are not uncommon. So, the
  speed is crucial here.
 
  Since retrieving the document_id for each document is slow, I am using
  FileldCache to store the values of document_id. For all the results
  collected (in a bitset) with hit collector, document_id field is
  retrieved from the fieldcache.
 
  1. How can I effectively disable scoring? I have read that
  ConstantScoreQuery is quite fast, but from the code, I see that it is
  used only for wildcard queries. How can I use ConstantScoreQuery for
  all the queries (boolean, term, phrase, ..)?  Also, is
  ConstantScoreQuery as fast as a custom hit collector?
 
  2. How can Solr take advantage of the fieldcache while returning the
  field document_id? The documentation says, fieldcache can be
  explicitly auto warmed with Solr.  If fieldcache is available and
  initialized at the beginning, will solr look into the cache to
  retrieve the fields to be returned?
 
  3. If there is an additional field for stemmed_content on which search
  needs to use different analyzer, I suppose, that could be specified by
  fieldType attribute in the schema.
 
  Thank you,
 
  --shashi
 
 



Re: DeltaImport problem

2009-09-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
yeah, not yet released but going to be released pretty soon

On Wed, Sep 16, 2009 at 6:32 PM, KirstyS kirst...@gmail.com wrote:

 I thought 1.4 was not released yet?


 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 I vaguely remember there was an issue with delta-import in 1.3. could
 you try it out with Solr1.4

 On Wed, Sep 16, 2009 at 6:14 PM, KirstyS kirst...@gmail.com wrote:

 I hope this is the correct place to post this issue and if so, that
 someone
 can help.
 I am using the DIH with Solr 1.3
 My data-config.xml file looks like this:
 dataSource
        driver=net.sourceforge.jtds.jdbc.Driver
                    url=jdbc:jtds:sqlserver:{taken out for posting}
        user={taken out for posting}
        password={taken out for posting} /

  entity name=article pk=CmsArticleId
            query=Select a.CmsArticleId, a.CreatorId , LastUpdatedBy,
 LastUpdateDate, Title, Synopsis,     Author, Source, IsPublished,
 ArticleTypeId,
                          a.StrapHead, ShortHeading, HomePageBlurb,
 ByLine,
 ArticleStatusId, LiveEdit, a.OriginalCategoryId, aac.Rank,
                          c.CategoryId, AncestralName, CategoryName,
 CategoryDisplayName, ParentCategoryId, c.SiteId
                   from Category c (nolock)
                        inner join CmsArticleCollection ac (nolock) on
 c.DefaultCmsArticleCollectionId = ac.CmsArticleCollectionId
                        inner join CmsArticleArticleCollection aac
 (nolock)
 on ac.CmsArticleCollectionId = aac.CmsArticleCollectionId
                        inner join CmsArticle a (nolock) on
 aac.CmsArticleId
 = a.CmsArticleId
                  where (a.LiveEdit is null or a.LiveEdit = 0)
                         and aac.SourceCmsArticleArticleCollectionId is
 null
            deltaQuery=Select a.CmsArticleId, a.CreatorId ,
 LastUpdatedBy,
 LastUpdateDate, Title, Synopsis, Author, Source, IsPublished,
 ArticleTypeId,
                          a.StrapHead, ShortHeading, HomePageBlurb,
 ByLine,
 ArticleStatusId, LiveEdit, a.OriginalCategoryId, aac.Rank,
                          c.CategoryId, AncestralName, CategoryName,
 CategoryDisplayName, ParentCategoryId, c.SiteId
                   from Category c (nolock)
                        inner join CmsArticleCollection ac (nolock) on
 c.DefaultCmsArticleCollectionId = ac.CmsArticleCollectionId
                        inner join CmsArticleArticleCollection aac
 (nolock)
 on ac.CmsArticleCollectionId = aac.CmsArticleCollectionId
                        inner join CmsArticle a (nolock) on
 aac.CmsArticleId
 = a.CmsArticleId
                  where (a.LiveEdit is null or a.LiveEdit = 0)
                         and aac.SourceCmsArticleArticleCollectionId is
 null
                                                 and (LastUpdateDate 
 '${dataimporter.last_index_time}'
 OR a.CreationDate  '${dataimporter.last_index_time}') 

 Have tried casting the dataimporter.last_index_time and the other date
 fields. To no avail. My Full Import works perfectly but I cannot get the
 command=delta-import to pick up the updated records. The LastUpdateDate
 is
 being updated. When I run this in the debug interface with delta-import
 it
 just never calls the delta import.
 Please, if anyone knows what I am doing wrong???
 Many thanks
 Kirsty
 --
 View this message in context:
 http://www.nabble.com/DeltaImport-problem-tp25471596p25471596.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



 --
 View this message in context: 
 http://www.nabble.com/DeltaImport-problem-tp25471596p25471927.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: DeltaImport problem

2009-09-16 Thread KirstyS

mmm..can't seem to find the link..could you help?


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 yeah, not yet released but going to be released pretty soon
 
 On Wed, Sep 16, 2009 at 6:32 PM, KirstyS kirst...@gmail.com wrote:

 I thought 1.4 was not released yet?


 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 I vaguely remember there was an issue with delta-import in 1.3. could
 you try it out with Solr1.4

 On Wed, Sep 16, 2009 at 6:14 PM, KirstyS kirst...@gmail.com wrote:

 I hope this is the correct place to post this issue and if so, that
 someone
 can help.
 I am using the DIH with Solr 1.3
 My data-config.xml file looks like this:
 dataSource
        driver=net.sourceforge.jtds.jdbc.Driver
                    url=jdbc:jtds:sqlserver:{taken out for posting}
        user={taken out for posting}
        password={taken out for posting} /

  entity name=article pk=CmsArticleId
            query=Select a.CmsArticleId, a.CreatorId , LastUpdatedBy,
 LastUpdateDate, Title, Synopsis,     Author, Source, IsPublished,
 ArticleTypeId,
                          a.StrapHead, ShortHeading, HomePageBlurb,
 ByLine,
 ArticleStatusId, LiveEdit, a.OriginalCategoryId, aac.Rank,
                          c.CategoryId, AncestralName, CategoryName,
 CategoryDisplayName, ParentCategoryId, c.SiteId
                   from Category c (nolock)
                        inner join CmsArticleCollection ac (nolock) on
 c.DefaultCmsArticleCollectionId = ac.CmsArticleCollectionId
                        inner join CmsArticleArticleCollection aac
 (nolock)
 on ac.CmsArticleCollectionId = aac.CmsArticleCollectionId
                        inner join CmsArticle a (nolock) on
 aac.CmsArticleId
 = a.CmsArticleId
                  where (a.LiveEdit is null or a.LiveEdit = 0)
                         and aac.SourceCmsArticleArticleCollectionId is
 null
            deltaQuery=Select a.CmsArticleId, a.CreatorId ,
 LastUpdatedBy,
 LastUpdateDate, Title, Synopsis, Author, Source, IsPublished,
 ArticleTypeId,
                          a.StrapHead, ShortHeading, HomePageBlurb,
 ByLine,
 ArticleStatusId, LiveEdit, a.OriginalCategoryId, aac.Rank,
                          c.CategoryId, AncestralName, CategoryName,
 CategoryDisplayName, ParentCategoryId, c.SiteId
                   from Category c (nolock)
                        inner join CmsArticleCollection ac (nolock) on
 c.DefaultCmsArticleCollectionId = ac.CmsArticleCollectionId
                        inner join CmsArticleArticleCollection aac
 (nolock)
 on ac.CmsArticleCollectionId = aac.CmsArticleCollectionId
                        inner join CmsArticle a (nolock) on
 aac.CmsArticleId
 = a.CmsArticleId
                  where (a.LiveEdit is null or a.LiveEdit = 0)
                         and aac.SourceCmsArticleArticleCollectionId is
 null
                                                 and (LastUpdateDate 
 '${dataimporter.last_index_time}'
 OR a.CreationDate  '${dataimporter.last_index_time}') 

 Have tried casting the dataimporter.last_index_time and the other date
 fields. To no avail. My Full Import works perfectly but I cannot get
 the
 command=delta-import to pick up the updated records. The LastUpdateDate
 is
 being updated. When I run this in the debug interface with delta-import
 it
 just never calls the delta import.
 Please, if anyone knows what I am doing wrong???
 Many thanks
 Kirsty
 --
 View this message in context:
 http://www.nabble.com/DeltaImport-problem-tp25471596p25471596.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



 --
 View this message in context:
 http://www.nabble.com/DeltaImport-problem-tp25471596p25471927.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com
 
 

-- 
View this message in context: 
http://www.nabble.com/DeltaImport-problem-tp25471596p25472102.html
Sent from the Solr - User mailing list archive at Nabble.com.



When to use Solr over Lucene

2009-09-16 Thread balaji.a

Hi All,
   I am aware that Solr internally uses Lucene for search and indexing. But
it would be helpful if anybody explains about Solr features that is not
provided by Lucene.

Thanks,
Balaji.
-- 
View this message in context: 
http://www.nabble.com/When-to-use-Solr-over-Lucene-tp25472354p25472354.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Retrieving a field from all result docuemnts couple of more queries

2009-09-16 Thread Shashikant Kore
No, I don't wish to put a custom Similarity.  Rather, I want an
equivalent of HitCollector where I can bypass the scoring altogether.
And I prefer to do it by changing the configuration.

--shashi

On Wed, Sep 16, 2009 at 6:36 PM, rajan chandi chandi.ra...@gmail.com wrote:
 You might be talking about modifying the similarity object to modify scoring
 formula in Lucene!

  $searcher-setSimilarity($similarity);
  $writer-setSimilarity($similarity);


 This can very well be done in Solr as SolrIndexWriter inherits from Lucene
 IndexWriter class.
 You might want to download the Solr Source code and take a look at the
 SolrIndexWriter to begin with!

 It's in the package - org.apache.solr.update

 Thanks
 Rajan

 On Wed, Sep 16, 2009 at 5:42 PM, Shashikant Kore shashik...@gmail.comwrote:

 Thanks, Abhay.

 Can someone please throw light on how to disable scoring?

 --shashi

 On Wed, Sep 16, 2009 at 11:55 AM, abhay kumar abhay...@gmail.com wrote:
  Hi,
 
  1)Solr has various type of caches . We can specify how many documents
 cache
  can have at a time.
        e.g. if windowsize=50
            50 results will be cached in queryResult Cache.
             if user makes a new request to server for results after 50
  documents a new request will be sent to the server  server will retrieve
  next             50 results in the cache.
        http://wiki.apache.org/solr/SolrCaching
        Yes, solr looks into the cache to retrieve the fields to be
 returned.
 
  2) Yes, we can have different tokenizers or filters for index  search.
 We
  need not create a different fieldtype. We need to configure the same
  fieldtype (datatype) for index  search analyzers sections differently.
 
    e.g.
 
         fieldType name=textSpell class=solr.TextField
  positionIncrementGap=100 stored=false multiValued=true
           *analyzer type=index*
          tokenizer class=solr.StandardTokenizerFactory/
          filter class=solr.LowerCaseFilterFactory/
 
          !--filter class=solr.SynonymFilterFactory
  synonyms=Synonyms.txt ignoreCase=true expand=false/--
          filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt/
          filter class=solr.StandardFilterFactory/
          filter class=solr.RemoveDuplicatesTokenFilterFactory/
        /analyzer
       * analyzer type=query*
          tokenizer class=solr.StandardTokenizerFactory/
          filter class=solr.LowerCaseFilterFactory/
 
          filter class=solr.StandardFilterFactory/
          filter class=solr.RemoveDuplicatesTokenFilterFactory/
       /analyzer
     /fieldType
 
 
 
  Regards,
  Abhay
 
  On Tue, Sep 15, 2009 at 6:41 PM, Shashikant Kore shashik...@gmail.com
 wrote:
 
  Hi,
 
  I am familiar with Lucene and trying out Solr.
 
  I have index which was created outside solr. The index is fairly
  simple with two field - document_id   content. The query result needs
  to return all the document IDs. The result need not be ordered by the
  score. For this, in Lucene, I use custom hit collector with search to
  get results quickly. The index has a few million documents and queries
  returning hundreds of thousands of documents are not uncommon. So, the
  speed is crucial here.
 
  Since retrieving the document_id for each document is slow, I am using
  FileldCache to store the values of document_id. For all the results
  collected (in a bitset) with hit collector, document_id field is
  retrieved from the fieldcache.
 
  1. How can I effectively disable scoring? I have read that
  ConstantScoreQuery is quite fast, but from the code, I see that it is
  used only for wildcard queries. How can I use ConstantScoreQuery for
  all the queries (boolean, term, phrase, ..)?  Also, is
  ConstantScoreQuery as fast as a custom hit collector?
 
  2. How can Solr take advantage of the fieldcache while returning the
  field document_id? The documentation says, fieldcache can be
  explicitly auto warmed with Solr.  If fieldcache is available and
  initialized at the beginning, will solr look into the cache to
  retrieve the fields to be returned?
 
  3. If there is an additional field for stemmed_content on which search
  needs to use different analyzer, I suppose, that could be specified by
  fieldType attribute in the schema.
 
  Thank you,
 
  --shashi
 
 




Re: DeltaImport problem

2009-09-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
http://people.apache.org/builds/lucene/solr/nightly/

On Wed, Sep 16, 2009 at 6:42 PM, KirstyS kirst...@gmail.com wrote:

 mmm..can't seem to find the link..could you help?


 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 yeah, not yet released but going to be released pretty soon

 On Wed, Sep 16, 2009 at 6:32 PM, KirstyS kirst...@gmail.com wrote:

 I thought 1.4 was not released yet?


 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 I vaguely remember there was an issue with delta-import in 1.3. could
 you try it out with Solr1.4

 On Wed, Sep 16, 2009 at 6:14 PM, KirstyS kirst...@gmail.com wrote:

 I hope this is the correct place to post this issue and if so, that
 someone
 can help.
 I am using the DIH with Solr 1.3
 My data-config.xml file looks like this:
 dataSource
        driver=net.sourceforge.jtds.jdbc.Driver
                    url=jdbc:jtds:sqlserver:{taken out for posting}
        user={taken out for posting}
        password={taken out for posting} /

  entity name=article pk=CmsArticleId
            query=Select a.CmsArticleId, a.CreatorId , LastUpdatedBy,
 LastUpdateDate, Title, Synopsis,     Author, Source, IsPublished,
 ArticleTypeId,
                          a.StrapHead, ShortHeading, HomePageBlurb,
 ByLine,
 ArticleStatusId, LiveEdit, a.OriginalCategoryId, aac.Rank,
                          c.CategoryId, AncestralName, CategoryName,
 CategoryDisplayName, ParentCategoryId, c.SiteId
                   from Category c (nolock)
                        inner join CmsArticleCollection ac (nolock) on
 c.DefaultCmsArticleCollectionId = ac.CmsArticleCollectionId
                        inner join CmsArticleArticleCollection aac
 (nolock)
 on ac.CmsArticleCollectionId = aac.CmsArticleCollectionId
                        inner join CmsArticle a (nolock) on
 aac.CmsArticleId
 = a.CmsArticleId
                  where (a.LiveEdit is null or a.LiveEdit = 0)
                         and aac.SourceCmsArticleArticleCollectionId is
 null
            deltaQuery=Select a.CmsArticleId, a.CreatorId ,
 LastUpdatedBy,
 LastUpdateDate, Title, Synopsis, Author, Source, IsPublished,
 ArticleTypeId,
                          a.StrapHead, ShortHeading, HomePageBlurb,
 ByLine,
 ArticleStatusId, LiveEdit, a.OriginalCategoryId, aac.Rank,
                          c.CategoryId, AncestralName, CategoryName,
 CategoryDisplayName, ParentCategoryId, c.SiteId
                   from Category c (nolock)
                        inner join CmsArticleCollection ac (nolock) on
 c.DefaultCmsArticleCollectionId = ac.CmsArticleCollectionId
                        inner join CmsArticleArticleCollection aac
 (nolock)
 on ac.CmsArticleCollectionId = aac.CmsArticleCollectionId
                        inner join CmsArticle a (nolock) on
 aac.CmsArticleId
 = a.CmsArticleId
                  where (a.LiveEdit is null or a.LiveEdit = 0)
                         and aac.SourceCmsArticleArticleCollectionId is
 null
                                                 and (LastUpdateDate 
 '${dataimporter.last_index_time}'
 OR a.CreationDate  '${dataimporter.last_index_time}') 

 Have tried casting the dataimporter.last_index_time and the other date
 fields. To no avail. My Full Import works perfectly but I cannot get
 the
 command=delta-import to pick up the updated records. The LastUpdateDate
 is
 being updated. When I run this in the debug interface with delta-import
 it
 just never calls the delta import.
 Please, if anyone knows what I am doing wrong???
 Many thanks
 Kirsty
 --
 View this message in context:
 http://www.nabble.com/DeltaImport-problem-tp25471596p25471596.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



 --
 View this message in context:
 http://www.nabble.com/DeltaImport-problem-tp25471596p25471927.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



 --
 View this message in context: 
 http://www.nabble.com/DeltaImport-problem-tp25471596p25472102.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: When to use Solr over Lucene

2009-09-16 Thread Grant Ingersoll


On Sep 16, 2009, at 9:26 AM, balaji.a wrote:



Hi All,
  I am aware that Solr internally uses Lucene for search and  
indexing. But
it would be helpful if anybody explains about Solr features that is  
not

provided by Lucene.



Solr is a server, Lucene is an API
Faceting
Distributed search
Replication
Easy configuration
You don't want to program much (or do Java)
Index warming

http://lucene.apache.org/solr/features.html

Generally speaking, Solr is what you end up building when you build a  
Lucene search application, give or take a few features here and  
there.  I've seen a lot of Lucene apps and I'm always amazed how many  
look pretty much like Solr in terms of infrastructure.


I'd use Lucene when you want to have control over every last bit of  
how things work or you need something that isn't in Solr (like Span  
Queries, but even that is doable in Solr w/ a little work)




Thanks,
Balaji.
--
View this message in context: 
http://www.nabble.com/When-to-use-Solr-over-Lucene-tp25472354p25472354.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: When to use Solr over Lucene

2009-09-16 Thread Israel Ekpo
Comparing Solr to Lucene is not exactly an apples-to-apples comparison.

Solr is a superset of Lucene. It uses the Lucene engine to index and process
requests for data retrieval.

Start here first : *
http://lucene.apache.org/solr/features.html#Solr+Uses+the+Lucene+Search+Library+and+Extends+it
!*

It would be unfair to compare to the Apache webserver to a cgi scripting
interface.

The apache webserver is just the container through with the webrowser
interacts with the CGI scripts.

This is very similar to how Solr is related to Lucene.

On Wed, Sep 16, 2009 at 9:26 AM, balaji.a reachbalaj...@gmail.com wrote:


 Hi All,
   I am aware that Solr internally uses Lucene for search and indexing. But
 it would be helpful if anybody explains about Solr features that is not
 provided by Lucene.

 Thanks,
 Balaji.
 --
 View this message in context:
 http://www.nabble.com/When-to-use-Solr-over-Lucene-tp25472354p25472354.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: When to use Solr over Lucene

2009-09-16 Thread Mark Miller
balaji.a wrote:
 Hi All,
I am aware that Solr internally uses Lucene for search and indexing. But
 it would be helpful if anybody explains about Solr features that is not
 provided by Lucene.

 Thanks,
 Balaji.
   
Any advanced Lucene application generally goes down the same path:

Build a system to manage IndexReaders and IndexWriters and concurrency
and index view refreshing. This is very
hard for beginners to get right - though many have tried.

Figure out how you want to manage (or not) some kind of schema. Write
something that *basically* does the job. Write something so that none
java programmers can setup the schema so you don't have to.

Add niceties on top, like support for efficient autocomplete and
spellchecking and faceting and plugins.

Figure out a scheme to replicate and distribute indexes so that you can
scale.

Add support for other APIs. REST, perl, whatever else your crazy
superiors are pulling from your crazy coworkers.

Add support for parsing rich documents, like pdfs, ms word, and dozens
of other formats.

Do it a short time with a small team. Spend a lot of time fixing bugs
and whacking at performance issues. Get most of it wrong because you
will the first time you do this. If your lucky: get a lot of it right
too and feel great about your large complicated system as you hurry to
fix all of its many imperfections - and then spend lots of time keeping
up with the latest changes, features, and improvements added to Lucene.
Or sit on the old features frozen in time. You won't have done it all
either - its too much work to do it all well in a reasonable amount of
time for a dev team that is not actually supposed to be building a
search server. You will cut stuff, you will skimp on stuff, and you will
make tradeoffs left and right.

I've gone down that path - I started *just* before Solr got rolling in
06. Lots of people have gone down that path or are on that path.

Solr does all of that for you, and it does it well. Many of those that
work on Lucene work on Solr. New Lucene features automatically go into
Solr. Solr will be maintained and developed by a team of people that are
not you, while your homegrown system (which does only 60% of what Solr
does and does it worse) will likely cobweb over 95% of the code. I love
developing with Lucene, and I bet you will too - but most people should
be using Solr.

Certain, target applications can still benefit using Lucene. Some Lucene
features don't move to Solr for a while. If you want near real-time,
thats only in Lucene right now. If you want everything done per segment,
that just Lucene right now (Solr still does some thing not per segment).
There are other little pros as well.

Its a tradeoff, that for the general guy looking for search, heavily
favors using Solr.

-- 
- Mark

http://www.lucidimagination.com





Re: Retrieving a field from all result docuemnts couple of more queries

2009-09-16 Thread rajan chandi
You will need to get SolrIndexSearcher.java and modify following:-

public static final int GET_SCORES =   0x01;


--Rajan

On Wed, Sep 16, 2009 at 6:58 PM, Shashikant Kore shashik...@gmail.comwrote:

 No, I don't wish to put a custom Similarity.  Rather, I want an
 equivalent of HitCollector where I can bypass the scoring altogether.
 And I prefer to do it by changing the configuration.

 --shashi

 On Wed, Sep 16, 2009 at 6:36 PM, rajan chandi chandi.ra...@gmail.com
 wrote:
  You might be talking about modifying the similarity object to modify
 scoring
  formula in Lucene!
 
   $searcher-setSimilarity($similarity);
   $writer-setSimilarity($similarity);
 
 
  This can very well be done in Solr as SolrIndexWriter inherits from
 Lucene
  IndexWriter class.
  You might want to download the Solr Source code and take a look at the
  SolrIndexWriter to begin with!
 
  It's in the package - org.apache.solr.update
 
  Thanks
  Rajan
 
  On Wed, Sep 16, 2009 at 5:42 PM, Shashikant Kore shashik...@gmail.com
 wrote:
 
  Thanks, Abhay.
 
  Can someone please throw light on how to disable scoring?
 
  --shashi
 
  On Wed, Sep 16, 2009 at 11:55 AM, abhay kumar abhay...@gmail.com
 wrote:
   Hi,
  
   1)Solr has various type of caches . We can specify how many documents
  cache
   can have at a time.
 e.g. if windowsize=50
 50 results will be cached in queryResult Cache.
  if user makes a new request to server for results after 50
   documents a new request will be sent to the server  server will
 retrieve
   next 50 results in the cache.
 http://wiki.apache.org/solr/SolrCaching
 Yes, solr looks into the cache to retrieve the fields to be
  returned.
  
   2) Yes, we can have different tokenizers or filters for index 
 search.
  We
   need not create a different fieldtype. We need to configure the same
   fieldtype (datatype) for index  search analyzers sections
 differently.
  
 e.g.
  
  fieldType name=textSpell class=solr.TextField
   positionIncrementGap=100 stored=false multiValued=true
*analyzer type=index*
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
  
   !--filter class=solr.SynonymFilterFactory
   synonyms=Synonyms.txt ignoreCase=true expand=false/--
   filter class=solr.StopFilterFactory ignoreCase=true
   words=stopwords.txt/
   filter class=solr.StandardFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
* analyzer type=query*
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
  
   filter class=solr.StandardFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
  /fieldType
  
  
  
   Regards,
   Abhay
  
   On Tue, Sep 15, 2009 at 6:41 PM, Shashikant Kore 
 shashik...@gmail.com
  wrote:
  
   Hi,
  
   I am familiar with Lucene and trying out Solr.
  
   I have index which was created outside solr. The index is fairly
   simple with two field - document_id   content. The query result
 needs
   to return all the document IDs. The result need not be ordered by the
   score. For this, in Lucene, I use custom hit collector with search to
   get results quickly. The index has a few million documents and
 queries
   returning hundreds of thousands of documents are not uncommon. So,
 the
   speed is crucial here.
  
   Since retrieving the document_id for each document is slow, I am
 using
   FileldCache to store the values of document_id. For all the results
   collected (in a bitset) with hit collector, document_id field is
   retrieved from the fieldcache.
  
   1. How can I effectively disable scoring? I have read that
   ConstantScoreQuery is quite fast, but from the code, I see that it is
   used only for wildcard queries. How can I use ConstantScoreQuery for
   all the queries (boolean, term, phrase, ..)?  Also, is
   ConstantScoreQuery as fast as a custom hit collector?
  
   2. How can Solr take advantage of the fieldcache while returning the
   field document_id? The documentation says, fieldcache can be
   explicitly auto warmed with Solr.  If fieldcache is available and
   initialized at the beginning, will solr look into the cache to
   retrieve the fields to be returned?
  
   3. If there is an additional field for stemmed_content on which
 search
   needs to use different analyzer, I suppose, that could be specified
 by
   fieldType attribute in the schema.
  
   Thank you,
  
   --shashi
  
  
 
 



Re: When to use Solr over Lucene

2009-09-16 Thread Israel Ekpo
Also Solr simplifies the process of implementing the client side interface.
You can use the same indices with clients written in any programming
language.

The client side could be in virtually any programming language of your
choosing.

If you were to work directly with Lucene, that would not be the case.

On Wed, Sep 16, 2009 at 9:49 AM, Israel Ekpo israele...@gmail.com wrote:

 Comparing Solr to Lucene is not exactly an apples-to-apples comparison.

 Solr is a superset of Lucene. It uses the Lucene engine to index and
 process requests for data retrieval.

 Start here first : *
 http://lucene.apache.org/solr/features.html#Solr+Uses+the+Lucene+Search+Library+and+Extends+it
 !*

 It would be unfair to compare to the Apache webserver to a cgi scripting
 interface.

 The apache webserver is just the container through with the webrowser
 interacts with the CGI scripts.

 This is very similar to how Solr is related to Lucene.


 On Wed, Sep 16, 2009 at 9:26 AM, balaji.a reachbalaj...@gmail.com wrote:


 Hi All,
   I am aware that Solr internally uses Lucene for search and indexing. But
 it would be helpful if anybody explains about Solr features that is not
 provided by Lucene.

 Thanks,
 Balaji.
 --
 View this message in context:
 http://www.nabble.com/When-to-use-Solr-over-Lucene-tp25472354p25472354.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Good Enough is not good enough.
 To give anything less than your best is to sacrifice the gift.
 Quality First. Measure Twice. Cut Once.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: Disabling tf (term frequency) during indexing and/or scoring

2009-09-16 Thread Alexey Serba
Hi Aaron,

You can overwrite default Lucene Similarity and disable tf and
lengthNorm factors in scoring formula ( see
http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Similarity.html
and http://lucene.apache.org/java/2_4_1/api/index.html )

You need to

1) compile the following class and put it into Solr WEB-INF/classes
---
package my.package;

import org.apache.lucene.search.DefaultSimilarity;

public class NoLengthNormAndTfSimilarity extends DefaultSimilarity {

public float lengthNorm(String fieldName, int numTerms) {
return numTerms  0 ? 1.0f : 0.0f;
}

public float tf(float freq) {
return freq  0 ? 1.0f : 0.0f;
}
}
---

2. Add similarity class=my.package.NoLengthNormAndTfSimilarity/
into your schema.xml
http://wiki.apache.org/solr/SchemaXml#head-e343cad75d2caa52ac6ec53d4cee8296946d70ca

HIH,
Alex

On Mon, Sep 14, 2009 at 9:50 PM, Aaron McKee ucbmc...@gmail.com wrote:
 Hello,

 Let me preface this by admitting that I'm still fairly new to Lucene and
 Solr, so I apologize if any of this sounds naive and I'm open to thinking
 about my problem differently.

 I'm currently responsible for a rather large dataset of business records
 that I'm trying to build a Lucene/Solr infrastructure around, to replace an
 in-house solution that we've been using for a few years. These records are
 sourced from multiple providers and there's often a fair bit of overlap in
 the business coverage. I have a set of fuzzy correlation libraries that I
 use to identify these documents and I ultimately create a super-record that
 includes metadata from each of the providers. Given the nature of things,
 these providers often have slight variations in wording or spelling in the
 overlapping fields (it's amazing how many ways people find to refer to the
 same business or address). I'd like to capture these variations, as they
 facilitate searching, but TF considerations are currently borking field
 scoring here.

 For example, taking business names into consideration, I have a Solr schema
 similar to:

 field name=name_provider1 type=string indexed=false stored=false
 multiValued=true
 ...
 field name=name_providerN type=string indexed=false stored=false
 multiValued=true
 field name=nameNorm type=text indexed=true stored=false
 multiValued=true omitNorms=true

 copyField source=name_provider1 dest=nameNorm
 ...
 copyField source=name_providerN dest=nameNorm

 For any given business record, there may be 1..N business names present in
 the nameNorm field (some with naming variations, some identical). With TF
 enabled, however, I'm getting different match scores on this field simply
 based on how many providers contributed to the record, which is not
 meaningful to me. For example, a record containing nameNormfoo
 barpositionIncrementGapfoo bar/nameNorm is necessarily scoring higher
 than a record just containing nameNormfoo bar/nameNorm.  Although I
 wouldn't mind TF data being considered within each discrete field value, I
 need to find a way to prevent score inflation based simply on the number of
 contributing providers.

 Looking at the mailing list archive and searching around, it sounds like the
 omitTf boolean in Lucene used to function somewhat in this manner, but has
 since taken on a broader interpretation (and name) that now also disables
 positional and payload data. Unfortunately, phrase support for fields like
 this is absolutely essential. So what's the best way to address a need like
 this? I guess I don't mind whether this is handled at index time or search
 time, but I'm not sure what I may need to override or if there's some
 existing provision I should take advantage of.

 Thank you for any help you may have.

 Best regards,
 Aaron



Re: Disabling tf (term frequency) during indexing and/or scoring

2009-09-16 Thread Erik Hatcher
Just FYI - you can put Solr plugins in solr-home/lib as JAR files  
rather than messing with solr.war


Erik

On Sep 16, 2009, at 10:15 AM, Alexey Serba wrote:


Hi Aaron,

You can overwrite default Lucene Similarity and disable tf and
lengthNorm factors in scoring formula ( see
http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Similarity.html
and http://lucene.apache.org/java/2_4_1/api/index.html )

You need to

1) compile the following class and put it into Solr WEB-INF/classes
---
package my.package;

import org.apache.lucene.search.DefaultSimilarity;

public class NoLengthNormAndTfSimilarity extends DefaultSimilarity {

public float lengthNorm(String fieldName, int numTerms) {
return numTerms  0 ? 1.0f : 0.0f;
}

public float tf(float freq) {
return freq  0 ? 1.0f : 0.0f;
}
}
---

2. Add similarity class=my.package.NoLengthNormAndTfSimilarity/
into your schema.xml
http://wiki.apache.org/solr/SchemaXml#head-e343cad75d2caa52ac6ec53d4cee8296946d70ca

HIH,
Alex

On Mon, Sep 14, 2009 at 9:50 PM, Aaron McKee ucbmc...@gmail.com  
wrote:

Hello,

Let me preface this by admitting that I'm still fairly new to  
Lucene and
Solr, so I apologize if any of this sounds naive and I'm open to  
thinking

about my problem differently.

I'm currently responsible for a rather large dataset of business  
records
that I'm trying to build a Lucene/Solr infrastructure around, to  
replace an
in-house solution that we've been using for a few years. These  
records are
sourced from multiple providers and there's often a fair bit of  
overlap in
the business coverage. I have a set of fuzzy correlation libraries  
that I
use to identify these documents and I ultimately create a super- 
record that
includes metadata from each of the providers. Given the nature of  
things,
these providers often have slight variations in wording or spelling  
in the
overlapping fields (it's amazing how many ways people find to refer  
to the
same business or address). I'd like to capture these variations, as  
they
facilitate searching, but TF considerations are currently borking  
field

scoring here.

For example, taking business names into consideration, I have a  
Solr schema

similar to:

field name=name_provider1 type=string indexed=false  
stored=false

multiValued=true
...
field name=name_providerN type=string indexed=false  
stored=false

multiValued=true
field name=nameNorm type=text indexed=true stored=false
multiValued=true omitNorms=true

copyField source=name_provider1 dest=nameNorm
...
copyField source=name_providerN dest=nameNorm

For any given business record, there may be 1..N business names  
present in
the nameNorm field (some with naming variations, some identical).  
With TF
enabled, however, I'm getting different match scores on this field  
simply

based on how many providers contributed to the record, which is not
meaningful to me. For example, a record containing nameNormfoo
barpositionIncrementGapfoo bar/nameNorm is necessarily scoring  
higher
than a record just containing nameNormfoo bar/nameNorm.   
Although I
wouldn't mind TF data being considered within each discrete field  
value, I
need to find a way to prevent score inflation based simply on the  
number of

contributing providers.

Looking at the mailing list archive and searching around, it sounds  
like the
omitTf boolean in Lucene used to function somewhat in this manner,  
but has
since taken on a broader interpretation (and name) that now also  
disables
positional and payload data. Unfortunately, phrase support for  
fields like
this is absolutely essential. So what's the best way to address a  
need like
this? I guess I don't mind whether this is handled at index time or  
search

time, but I'm not sure what I may need to override or if there's some
existing provision I should take advantage of.

Thank you for any help you may have.

Best regards,
Aaron





Any way to encrypt/decrypt stored fields?

2009-09-16 Thread Jay Hill
For security reasons (say I'm indexing very sensitive data, medical records
for example) is there a way to encrypt data that is stored in Solr? Some
businesses I've encountered have such needs and this is a barrier to them
adopting Solr to replace other legacy systems. Would it require a
custom-written filter to encrypt during indexing and decrypt at query time,
or is there something I'm unaware of already available to do this?

-Jay


Re: CSV Update - Need help mapping csv field to schema's ID

2009-09-16 Thread Insight 49, LLC

Thanks guys...

Yonik and Grant commented on this thread in the dev group.

Dan

Chris Hostetter wrote:

: I would like to add an additional name:value pair for every line, mapping the
: sku field to my schema's id field:
: 
: .map={sku.field}:{id}


the map param is for replacing a *value* with a different' value ... it's 
useful for things like numeric codes in CSV files that you want to replace 
with strings in your index.


: I would prefer NOT to change the schema by adding a copyField source=sku
: dest=id/.

that's the only solution i can think of unless you want to write an 
UpdateProcessor.



-Hoss





Re: Any way to encrypt/decrypt stored fields?

2009-09-16 Thread Bill Au
That's certainly something that is doable with a filter.  I am not aware of
any available.

Bill

On Wed, Sep 16, 2009 at 10:39 AM, Jay Hill jayallenh...@gmail.com wrote:

 For security reasons (say I'm indexing very sensitive data, medical records
 for example) is there a way to encrypt data that is stored in Solr? Some
 businesses I've encountered have such needs and this is a barrier to them
 adopting Solr to replace other legacy systems. Would it require a
 custom-written filter to encrypt during indexing and decrypt at query time,
 or is there something I'm unaware of already available to do this?

 -Jay



Re: Any way to encrypt/decrypt stored fields?

2009-09-16 Thread Erik Hatcher
This could be achieved purely client-side if all you're talking about  
is a stored field (not indexed/searchable).  The client-side could  
encrypt and encode the encrypted bits as text that Solr/Lucene can  
store.  Then decrypt client-side.


Erik

On Sep 16, 2009, at 10:39 AM, Jay Hill wrote:

For security reasons (say I'm indexing very sensitive data, medical  
records
for example) is there a way to encrypt data that is stored in Solr?  
Some
businesses I've encountered have such needs and this is a barrier to  
them

adopting Solr to replace other legacy systems. Would it require a
custom-written filter to encrypt during indexing and decrypt at  
query time,

or is there something I'm unaware of already available to do this?

-Jay




Re: do NOT want to stem plurals for a particular field, or words

2009-09-16 Thread Alexey Serba
  You can enable/disable stemming per field type in the schema.xml, by
 removing the stemming filters from the type definition.

 Basically, copy your prefered type, rename it to something like
 'text_nostem', remove the stemming filter from the type and use your
 'text_nostem' type for your field 'type' .
+ you can search in both fields text_stemmed and text_exact using
DisMax handler and boost text_exact matching. Thus if you search for
'articles' you'll get all results with 'articles' and 'article', but
exact match will be on top.


Re: faceted query not working as i expected

2009-09-16 Thread Jonathan Vanasco

Thank you Ahmet.

I forgot to encapuslate the searched string in quotations.

On Sep 15, 2009, at 5:19 PM, AHMET ARSLAN wrote:




--- On Tue, 9/15/09, Jonathan Vanasco jvana...@2xlp.com wrote:


From: Jonathan Vanasco jvana...@2xlp.com
Subject: faceted query not working as i expected
To: solr-user@lucene.apache.org
Date: Tuesday, September 15, 2009, 10:54 PM
I'm trying to request documents that
have facet.venue_type as Private Collection

Instead I'm also getting items where another field is
marked Permanent Collection

My schema has:

fields
field name=venue_type type=text
indexed=true stored=true required=false /
field name=facet.venue_type
type=string indexed=true stored=true required=false
/
/fields
copyField source=venue_type dest=facet.venue_type
/


My query is

q=*:*
qt=standard
facet=true
facet.missing=true
facet.field=facet.venue_type
fq=venue_type:Private+Collection

Can anyone offer a suggestion as to what I'm doing wrong ?



The filter query fq=venue_type:Private+Collection has a part that  
runs on default field. It is parsed to venue_type:Private  
defaultField:Collection You can use

fq=venue_type:Private+Collection
or
fq=venue_type:(Private AND Collection)
instead.

These will/may bring documents having something Private Collection  
in venue_type field since it is a tokenized field.


If you want to retrieve documents that have facet.venue_type as  
Private Collection you can use fq:facet.venue_type:Private  
Collection that operates on a string (non-tokenized) field.


Hope this helps.







Highlighting in stemmed or n-grammed fields possible?

2009-09-16 Thread David Espinosa
Hi,

Anybody knows how to get the highlighted field, when q term matches in a
stemmed or n-grammed filtered field?

Matching in a normal field (not stemmed or n-grammed)  highlighting works
perfectly as expected. But in stemmed matching cases, no highlighting fields
are recovered, and in n-gramming matching highlighting field is recovered
but in a bad order (example: if q=”solr” matches with “here is solr” results
to “emhere/em is solr”).

All fields are stored (and indexed as well….).



Thanks in advance.


Re: FileListEntityProcessor and LineEntityProcessor

2009-09-16 Thread Fergus McMenemie
Hi,

I'm trying to import data from a list of files using the
FileListEntityProcessor. Here is my import configuration:

  dataSource type=FileDataSource name=fileDataSource/
  document name=dict-entries
entity name=f processor=FileListEntityProcessor
baseDir=d:\my\directory\ fileName=.*WRK recursive=false
rootEntity=false
  entity name=jc
processor=LineEntityProcessor
url=${f.fileAbsolutePath}
dataSource=fileDataSource
transformer=myTransformer
  /entity
/entity
  /document

If I have only one file in d:\my\directory\ then everything works correctly.
If I have multiple files then I get the following exception: 

Sorry but I dont quite follow this. FileListEntityProcessor and
LineEntityProcessor are somewhat similar in that they provide
a list of filenames which the likes of XPathEntityProcessor
then open and parse.

Is the above your complete data-config.xml?

Can you provide more detail on what you are trying to do? ...
You seem to listing all files d:\my\directory\.*WRK. Do 
these WRK files contain lists of files to be indexed?





Sep 16, 2009 9:48:46 AM org.apache.solr.handler.dataimport.DocBuilder
buildDocum
ent
SEVERE: Exception while processing: f document : null
org.apache.solr.handler.dataimport.DataImportHandlerException: Problem
reading f
rom input Processing Document # 53812
at
org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
tityProcessor.java:112)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:348)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:376)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
ava:224)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
:167)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
rter.java:316)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
ava:376)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
va:355)
Caused by: java.io.IOException: Stream closed
at java.io.BufferedReader.ensureOpen(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at
org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
tityProcessor.java:109)
... 8 more
Sep 16, 2009 9:48:46 AM org.apache.solr.handler.dataimport.DataImporter
doFullIm
port
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Problem
reading f
rom input Processing Document # 53812
at
org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
tityProcessor.java:112)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:348)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:376)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
ava:224)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
:167)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
rter.java:316)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
ava:376)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
va:355)
Caused by: java.io.IOException: Stream closed
at java.io.BufferedReader.ensureOpen(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at
org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
tityProcessor.java:109)
... 8 more



Note that my input files have 53812 lines, which is the same as the document
number that I'm choking on. Does anyone know what I'm doing wrong?

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25476443.html
Sent from the Solr - User mailing list archive at Nabble.com.

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: FileListEntityProcessor and LineEntityProcessor

2009-09-16 Thread wojtekpia



Fergus McMenemie-2 wrote:
 
 
 Can you provide more detail on what you are trying to do? ...
 You seem to listing all files d:\my\directory\.*WRK. Do 
 these WRK files contain lists of files to be indexed?
 
 

That is my complete data config file. I have a directory containing a bunch
of files that have one entity per line. Each line contains blocks of data.
I parse out each block and process it appropriately using myTransformer. Is
this use of FileListEntityProcessor with LineEntityProcessor not supported?
-- 
View this message in context: 
http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25477613.html
Sent from the Solr - User mailing list archive at Nabble.com.



Effect of SynonymFilter on Solr document fields

2009-09-16 Thread Prasanna Ranganathan
Hi,

 I am a newbie to Solr and request you all to kindly excuse any rookie
mistakes.

 I have the following questions:

We use the Synonym Filter on one of the indexed fields. It is specified in
the schema.xml as one of the filters (for the analyzer type index) for that
field. I believe that this means any tokens which match an entry in the
provided synonym file will have all the forms indexed provided
expanded=true. I am able to verify that by using the Solr admin analysis
tool. However when I use Luke to examine a document in the index which would
have synonyms for that particular field, I see only the original value and
do not see the additional forms that should be added due to the synonym
match for the field in question. I am not sure if I am missing something
here. How do I verify the same?

Another related question ­ The field in question here is not specified as
multivalued. However, as I understand it a synonym match will mean multiple
values for that field. I was not able to find any documentation that
explains this in detail and would like to know how this particular case
impacts the indexing of that field, scoring, etc. How does the behavior of a
field having multiple values due to SynonymFilter compare and contrast with
the multivalued=true|false flag. What would a synonym match expansion for a
field with multivalued=false mean?

Prasanna.


Re: FileListEntityProcessor and LineEntityProcessor

2009-09-16 Thread wojtekpia

Note that if I change my import file to explicitly list all my files (instead
of using the FileListEntityProcessor) as below then everything works as I
expect.

  dataSource type=FileDataSource name=fileDataSource
basePath=d:\my\directory\/
  document name=dict-entries
entity name=jc processor=LineEntityProcessor url=file1.WRK
dataSource=fileDataSource transformer=myTransformer/entity
entity name=jc processor=LineEntityProcessor url=file2.WRK
dataSource=fileDataSource transformer=myTransformer/entity
entity name=jc processor=LineEntityProcessor url=file3.WRK
dataSource=fileDataSource transformer=myTransformer/entity

...

/document
-- 
View this message in context: 
http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25480830.html
Sent from the Solr - User mailing list archive at Nabble.com.



Latest trunk locks execution thread in SolrCore.getSearcher()

2009-09-16 Thread Dadasheva, Olga
Hi,

I am  testing EmbeddedSolrServer vs StreamingUpdateSolrServer  for my
crawlers using more or less recent Solr code and everything was fine
till today when I took the latest trunk code.
When I start my crawler I see a number of INFO outputs
2009-09-16 21:08:29,399 INFO  Adding
component:org.apache.solr.handler.component.highlightcompon...@36ae83
(SearchHandler.java:132) - [main]
2009-09-16 21:08:29,400 INFO  Adding
component:org.apache.solr.handler.component.statscompon...@1fb24d3
(SearchHandler.java:132) - [main]
2009-09-16 21:08:29,401 INFO  Adding
component:org.apache.solr.handler.component.termvectorcompon...@14ba9a2
(SearchHandler.java:132) - [main]
2009-09-16 21:08:29,402 INFO  Adding  debug
component:org.apache.solr.handler.component.debugcompon...@12ea1dd
(SearchHandler.java:137) - [main]

and then the log/program stops.

The thread dump reveals the following: 

main prio=3 tid=0x0003 nid=0x2 in Object.wait()
[0xfe67c000..0xfe67fd80]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0xeaaf6b10 (a java.lang.Object)
at java.lang.Object.wait(Object.java:485)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:991)
- locked 0xeaaf6b10 (a java.lang.Object)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:904)
at
org.apache.solr.handler.ReplicationHandler.getIndexVersion(ReplicationHa
ndler.java:472)
at
org.apache.solr.handler.ReplicationHandler.getStatistics(ReplicationHand
ler.java:490)
at
org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean.getMBeanInfo(JmxMo
nitoredMap.java:224)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassNa
me(DefaultMBeanServerInterceptor.java:321)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(Defa
ultMBeanServerInterceptor.java:307)
at
com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java
:482)
at
org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:137)
at
org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:47)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:4
46)
at org.apache.solr.core.SolrCore.init(SolrCore.java:578)
at
harvard.solr.search.service.EmbeddedSearchService.setSolrHome(EmbeddedSe
archService.java:47)

The same is happening for the  StreamingUpdateSolrServer.

Do you think it's a bug?

Thank you for looking into it,

-Olga


Re: Latest trunk locks execution thread in SolrCore.getSearcher()

2009-09-16 Thread Yonik Seeley
On a quick look, it looks like this was caused (or at least triggered by)
https://issues.apache.org/jira/browse/SOLR-1427

Registering the bean in the SolrCore constructor causes it to
immediately turn around and ask for the stats which asks for a
searcher, which blocks.

-Yonik
http://www.lucidimagination.com

On Wed, Sep 16, 2009 at 9:34 PM, Dadasheva, Olga
olga_dadash...@harvard.edu wrote:
 Hi,

 I am  testing EmbeddedSolrServer vs StreamingUpdateSolrServer  for my
 crawlers using more or less recent Solr code and everything was fine
 till today when I took the latest trunk code.
 When I start my crawler I see a number of INFO outputs
 2009-09-16 21:08:29,399 INFO  Adding
 component:org.apache.solr.handler.component.highlightcompon...@36ae83
 (SearchHandler.java:132) - [main]
 2009-09-16 21:08:29,400 INFO  Adding
 component:org.apache.solr.handler.component.statscompon...@1fb24d3
 (SearchHandler.java:132) - [main]
 2009-09-16 21:08:29,401 INFO  Adding
 component:org.apache.solr.handler.component.termvectorcompon...@14ba9a2
 (SearchHandler.java:132) - [main]
 2009-09-16 21:08:29,402 INFO  Adding  debug
 component:org.apache.solr.handler.component.debugcompon...@12ea1dd
 (SearchHandler.java:137) - [main]

 and then the log/program stops.

 The thread dump reveals the following:

 main prio=3 tid=0x0003 nid=0x2 in Object.wait()
 [0xfe67c000..0xfe67fd80]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on 0xeaaf6b10 (a java.lang.Object)
        at java.lang.Object.wait(Object.java:485)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:991)
        - locked 0xeaaf6b10 (a java.lang.Object)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:904)
        at
 org.apache.solr.handler.ReplicationHandler.getIndexVersion(ReplicationHa
 ndler.java:472)
        at
 org.apache.solr.handler.ReplicationHandler.getStatistics(ReplicationHand
 ler.java:490)
        at
 org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean.getMBeanInfo(JmxMo
 nitoredMap.java:224)
        at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassNa
 me(DefaultMBeanServerInterceptor.java:321)
        at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(Defa
 ultMBeanServerInterceptor.java:307)
        at
 com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java
 :482)
        at
 org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:137)
        at
 org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:47)
        at
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:4
 46)
        at org.apache.solr.core.SolrCore.init(SolrCore.java:578)
        at
 harvard.solr.search.service.EmbeddedSearchService.setSolrHome(EmbeddedSe
 archService.java:47)

 The same is happening for the  StreamingUpdateSolrServer.

 Do you think it's a bug?

 Thank you for looking into it,

 -Olga



Re: FileListEntityProcessor and LineEntityProcessor

2009-09-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
I have opened an issue SOLR-1440

On Thu, Sep 17, 2009 at 2:46 AM, wojtekpia wojte...@hotmail.com wrote:

 Note that if I change my import file to explicitly list all my files (instead
 of using the FileListEntityProcessor) as below then everything works as I
 expect.

  dataSource type=FileDataSource name=fileDataSource
 basePath=d:\my\directory\/
  document name=dict-entries
    entity name=jc processor=LineEntityProcessor url=file1.WRK
 dataSource=fileDataSource transformer=myTransformer/entity
    entity name=jc processor=LineEntityProcessor url=file2.WRK
 dataSource=fileDataSource transformer=myTransformer/entity
    entity name=jc processor=LineEntityProcessor url=file3.WRK
 dataSource=fileDataSource transformer=myTransformer/entity

 ...

 /document
 --
 View this message in context: 
 http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25480830.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: [DIH] URLDataSource and fetching a link

2009-09-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
2009/9/17 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 it is possible to have a sub entity which has XPathEntityProcessor
 which can use the link ar the url

This may not be a good solution.

But you can use the $hasMore and $nextUrl options of
XPathEntityProcessor to recursively loop if there are more links

 On Thu, Sep 17, 2009 at 8:57 AM, Grant Ingersoll gsing...@apache.org wrote:
 Many RSS feeds contain a link to some full article.  How can I have the
 DIH get the RSS feed and then have it go and fetch the content at the link?

 Thanks,
 Grant




 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Questions on copyField

2009-09-16 Thread Rahul R
Shalin,
Can you please elaborate a little more on the third response
*You can send the location's value directly as the value of the text field.*
I dont follow. I am adding 'name' and 'age' to the 'text' field through the
schema. If I add the 'location' from the program, will either one copy
(schema or program) not over-write the other ?
*Also note, that you don't really need to index/store the source field. You
can make the location field's type as ignored in the schema.*
Understood

Thank you for your response.

Regards
Rahul
On Wed, Sep 16, 2009 at 1:56 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Mon, Sep 14, 2009 at 5:12 PM, Rahul R rahul.s...@gmail.com wrote:

  Hello,
  I have a few questions regarding the copyField directive in schema.xml
 
  1. Does the destination field store a reference or the actual data ?
 

 It makes a copy. Storing or indexing of the field depends on the field
 configuration.


  If I have soemthing like this
  copyField source=name dest=text/
  then will the values in the 'name' field get copied into the 'text' field
  or
  will the 'text' field only store a reference to the 'name' field ? To put
  it
  more simply, if I later delete the 'name' field from the index will I
 lose
  the corresponding data in the 'text' field ?
 
 
 The values will get copied. If you delete all values from the 'name' field
 from the index, the data in text field remain as-is.



  2. Is there any inbuilt API which I can use to do the copyField action
  programmatically ?
 
 
 No. But you can always copy explicitly before sending or you can use a
 custom UpdateRequestProcessor to copy values from one field to another
 during indexing.


  3. Can I do a copyfield from the schema as well as programmatically for
 the
  same destination field
  Suppose I want the 'text' field to contain values for name, age and
  location. In my index only 'name' and 'age' are defined as fields. So I
 can
  add directives like
  copyField source=name dest=text/
  copyField source=age dest=text/
  The location however, I want to add it to the 'text' field
  programmatically.
  I don't want to store the location as a separate field in the index. Can
 I
  do this ?
 
 
 You can send the location's value directly as the value of the text field.
 Also note, that you don't really need to index/store the source field. You
 can make the location field's type as ignored in the schema.

 --
 Regards,
 Shalin Shekhar Mangar.