Re: Indexing a token to a different field in a custom filter

2013-11-12 Thread Dileepa Jayakody
I need to index the processed token to a different feild (eg:
stanbolResponse), in the same document that's being indexed.

I am looking for a way to retrieve the document.id from the TokenStream so
that I can update the same document with new field values. (In my sample
code above I'm adding a new document, instead of updating the same document)
Any pointers please?

Thanks,
Dileepa


On Tue, Nov 12, 2013 at 12:01 PM, Dileepa Jayakody 
dileepajayak...@gmail.com wrote:

 Hi All,

 In my custom filter, I need to index the processed token into a different
 field. The processed token is a Stanbol enhancement response.

 The solution I have so far found is to use a Solr client (solj) to add a
 new Document with my processed field into Solr. Below is the sample code
 segment;

  SolrServer server = new HttpSolrServer(http://localhost:8983/solr/;);
 SolrInputDocument doc1 = new SolrInputDocument();
 doc1.addField( id, id1, 1.0f );
 doc1.addField(stanbolResponse, response);
 try {
 server.add(doc1);
 server.commit();
  } catch (SolrServerException e) {
 e.printStackTrace();
 }


 This mechanism requires a new HTTP call to the local Solr server for every
 token I process for the stanbolRequest field, and I feel it's not very
 efficient.

 Is there any other alternative way to invoke a update request to add a new
 field to the indexing document within the filter (without making an
 explicit HTTP call using Solrj)?

 Thanks,
 Dileepa



Re: Indexing a token to a different field in a custom filter

2013-11-12 Thread Alvaro Cabrerizo
Hi,

Maybe the synonym
filterhttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactoryis
the mirror you can look in. You can start creating a new field type in
your schema that is stanbol enhanced. Let's follow with the parallelism, in
the case of synonym we could have this schema:

...
fielType name=synonymtext class=solr.TextField
positionIncrementGap=100
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true /
/fieldType
...
field name=id type=string indexed=true stored=true required=true
/
field name=description type=synonymtext indexed=true stored=true
multiValued=true /
...

In the case of stanbol:

...
fielType name=stanboltext class=solr.TextField
positionIncrementGap=100
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=StanbolFilterFactory  your Stanbol filter parameters here
/
/fieldType
...
field name=id type=string indexed=true stored=true required=true
/
field name=description type=synonymtext indexed=true stored=true
multiValued=true /
...

Thus the StanbolFilterFactory is in charge of connecting ot Stanbol and
enhance the data coming from WhitespaceTokenizerFactory, creating an output
that can be used by other filters.

How do you index your data, then?

Just send your doc:

id:your id
description:the data to be enhanced


Other path you can follow is imitate the behaviour of
CopyFieldhttp://wiki.apache.org/solr/SchemaXml#Copy_Fieldsin a more
sofisticated fashion i.e. (copy, enhance an put in a new field).
The you can have the next schema:

...
fielType name=text class=solr.TextField positionIncrementGap=100
  tokenizer class=solr.WhitespaceTokenizerFactory /
/fieldType
...
field name=id type=string indexed=true stored=true required=true
/
field name=description type=text indexed=true stored=true
multiValued=true /
field name=enhancedDescription type=text indexed=true stored=true
multiValued=true /
copyEnhanceField source=description dest=enhancedDescription /

The copyEnhanceField is now in charge of take the original field, send to
stanbol, get the response and write it in the new field.

How do you index your data then?

Just send your doc:

id:your id
description:the original data

And you will get in solr:

id:your id
description:the original data
enhancedDescription:the enhanced data


Regards


Re: Indexing a token to a different field in a custom filter

2013-11-12 Thread Erick Erickson
Whether what Alvaro outlined works for you or
not, do NOT commit after every document if you
use SolrJ. The commit will hurt performance much
more than the HTTP overhead.

And you can always batch up, say, 1,000 documents
and use the server.add(doclist) method.

Overall, worrying about HTTP overhead is usually a
red herring.

Best,
Erick


On Tue, Nov 12, 2013 at 3:20 AM, Alvaro Cabrerizo topor...@gmail.comwrote:

 Hi,

 Maybe the synonym
 filter
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 is
 the mirror you can look in. You can start creating a new field type in
 your schema that is stanbol enhanced. Let's follow with the parallelism, in
 the case of synonym we could have this schema:

 ...
 fielType name=synonymtext class=solr.TextField
 positionIncrementGap=100
   tokenizer class=solr.WhitespaceTokenizerFactory /
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true /
 /fieldType
 ...
 field name=id type=string indexed=true stored=true required=true
 /
 field name=description type=synonymtext indexed=true stored=true
 multiValued=true /
 ...

 In the case of stanbol:

 ...
 fielType name=stanboltext class=solr.TextField
 positionIncrementGap=100
   tokenizer class=solr.WhitespaceTokenizerFactory /
   filter class=StanbolFilterFactory  your Stanbol filter parameters here
 /
 /fieldType
 ...
 field name=id type=string indexed=true stored=true required=true
 /
 field name=description type=synonymtext indexed=true stored=true
 multiValued=true /
 ...

 Thus the StanbolFilterFactory is in charge of connecting ot Stanbol and
 enhance the data coming from WhitespaceTokenizerFactory, creating an output
 that can be used by other filters.

 How do you index your data, then?

 Just send your doc:

 id:your id
 description:the data to be enhanced


 Other path you can follow is imitate the behaviour of
 CopyFieldhttp://wiki.apache.org/solr/SchemaXml#Copy_Fieldsin a more
 sofisticated fashion i.e. (copy, enhance an put in a new field).
 The you can have the next schema:

 ...
 fielType name=text class=solr.TextField positionIncrementGap=100
   tokenizer class=solr.WhitespaceTokenizerFactory /
 /fieldType
 ...
 field name=id type=string indexed=true stored=true required=true
 /
 field name=description type=text indexed=true stored=true
 multiValued=true /
 field name=enhancedDescription type=text indexed=true stored=true
 multiValued=true /
 copyEnhanceField source=description dest=enhancedDescription /

 The copyEnhanceField is now in charge of take the original field, send to
 stanbol, get the response and write it in the new field.

 How do you index your data then?

 Just send your doc:

 id:your id
 description:the original data

 And you will get in solr:

 id:your id
 description:the original data
 enhancedDescription:the enhanced data


 Regards



Re: Indexing a token to a different field in a custom filter

2013-11-12 Thread Jack Krupansky
Any kind of cross-field processing is best done in an update processor. 
There are a lot of built-in update processors as well as a JavaScript script 
update processor.


-- Jack Krupansky

-Original Message- 
From: Dileepa Jayakody

Sent: Tuesday, November 12, 2013 1:31 AM
To: solr-user@lucene.apache.org
Subject: Indexing a token to a different field in a custom filter

Hi All,

In my custom filter, I need to index the processed token into a different
field. The processed token is a Stanbol enhancement response.

The solution I have so far found is to use a Solr client (solj) to add a
new Document with my processed field into Solr. Below is the sample code
segment;

SolrServer server = new HttpSolrServer(http://localhost:8983/solr/;);
   SolrInputDocument doc1 = new SolrInputDocument();
   doc1.addField( id, id1, 1.0f );
   doc1.addField(stanbolResponse, response);
   try {
server.add(doc1);
server.commit();
} catch (SolrServerException e) {
e.printStackTrace();
}


This mechanism requires a new HTTP call to the local Solr server for every
token I process for the stanbolRequest field, and I feel it's not very
efficient.

Is there any other alternative way to invoke a update request to add a new
field to the indexing document within the filter (without making an
explicit HTTP call using Solrj)?

Thanks,
Dileepa 



Re: Indexing a token to a different field in a custom filter

2013-11-12 Thread Dileepa Jayakody
Thanks all for your valuable inputs.

I looked at suggested solutions and I too feel, a* custom update
processor*during indexing will be the best solution to handle the
content field by
changing the value and storing it in another value.

Do I only need to change the below request handler to intercept all
indexing documents to perform my custom analysis during indexing? Or do I
need to change any other request handler also?
 requestHandler name=/update class=solr.UpdateRequestHandler

Thanks,
Dileepa


On Tue, Nov 12, 2013 at 7:37 PM, Jack Krupansky j...@basetechnology.comwrote:

 Any kind of cross-field processing is best done in an update processor.
 There are a lot of built-in update processors as well as a JavaScript
 script update processor.

 -- Jack Krupansky

 -Original Message- From: Dileepa Jayakody
 Sent: Tuesday, November 12, 2013 1:31 AM
 To: solr-user@lucene.apache.org
 Subject: Indexing a token to a different field in a custom filter


 Hi All,

 In my custom filter, I need to index the processed token into a different
 field. The processed token is a Stanbol enhancement response.

 The solution I have so far found is to use a Solr client (solj) to add a
 new Document with my processed field into Solr. Below is the sample code
 segment;

 SolrServer server = new HttpSolrServer(http://localhost:8983/solr/;);
SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField( id, id1, 1.0f );
doc1.addField(stanbolResponse, response);
try {
 server.add(doc1);
 server.commit();
 } catch (SolrServerException e) {
 e.printStackTrace();
 }


 This mechanism requires a new HTTP call to the local Solr server for every
 token I process for the stanbolRequest field, and I feel it's not very
 efficient.

 Is there any other alternative way to invoke a update request to add a new
 field to the indexing document within the filter (without making an
 explicit HTTP call using Solrj)?

 Thanks,
 Dileepa



Indexing a token to a different field in a custom filter

2013-11-11 Thread Dileepa Jayakody
Hi All,

In my custom filter, I need to index the processed token into a different
field. The processed token is a Stanbol enhancement response.

The solution I have so far found is to use a Solr client (solj) to add a
new Document with my processed field into Solr. Below is the sample code
segment;

 SolrServer server = new HttpSolrServer(http://localhost:8983/solr/;);
SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField( id, id1, 1.0f );
doc1.addField(stanbolResponse, response);
try {
server.add(doc1);
server.commit();
} catch (SolrServerException e) {
e.printStackTrace();
}


This mechanism requires a new HTTP call to the local Solr server for every
token I process for the stanbolRequest field, and I feel it's not very
efficient.

Is there any other alternative way to invoke a update request to add a new
field to the indexing document within the filter (without making an
explicit HTTP call using Solrj)?

Thanks,
Dileepa