RE: Mapping and Capture in ExtractingRequestHandler

2011-12-20 Thread Swapna Vuppala
Hi Erick,

Can you please give me little more information about SolrJ program and how to 
use it to construct a Solr document ?

Thanks and Regards,
Swapna.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, December 21, 2011 2:28 AM
To: solr-user@lucene.apache.org
Subject: Re: Mapping and Capture in ExtractingRequestHandler

When you start getting into complex HTML extraction, you're probably
better off using a SolrJ program with a forgiving HTML parser
and extracting the relevant bits yourself and construction a
SolrDocument.

FWIW,
Erick

On Tue, Dec 20, 2011 at 12:54 AM, Swapna Vuppala
swapna.vupp...@arup.com wrote:
 Hi,

 I understand that we can specify parameters in ExtractingRequestHandler in 
 solrconfig.xml to capture HTML tags of a particular type and map them to 
 desired solr fields, like something below.

 str name=capturediv/str
 str name=fmap.divmysolrfield/str

 The above setting will capture content in div tags and copy to the solr 
 field mysolrfield.

 What am interested is in capturing div tags with a particular class name to a 
 solr field. When extracting content from outlook messages, I would like to 
 capture the content within div class=message-body to go into a solr field 
 and the content within div class=attachment-entry to go into another solr 
 field.

 Can someone please let me know how to achieve this ?

 Thanks and Regards,
 Swapna.

 
 Electronic mail messages entering and leaving Arup  business
 systems are scanned for acceptability of content and viruses


Mapping and Capture in ExtractingRequestHandler

2011-12-19 Thread Swapna Vuppala
Hi,

I understand that we can specify parameters in ExtractingRequestHandler in 
solrconfig.xml to capture HTML tags of a particular type and map them to 
desired solr fields, like something below.

str name=capturediv/str
str name=fmap.divmysolrfield/str

The above setting will capture content in div tags and copy to the solr field 
mysolrfield.

What am interested is in capturing div tags with a particular class name to a 
solr field. When extracting content from outlook messages, I would like to 
capture the content within div class=message-body to go into a solr field 
and the content within div class=attachment-entry to go into another solr 
field.

Can someone please let me know how to achieve this ?

Thanks and Regards,
Swapna.


Electronic mail messages entering and leaving Arup  business
systems are scanned for acceptability of content and viruses


RE: Trim and copy a solr field

2011-12-15 Thread Swapna Vuppala
Hi Juan,

I think UpdateProcessor is what I would be needing. Can you please tell me more 
about it, as to how it works and all ?

Thanks and Regards,
Swapna.

-Original Message-
From: Juan Grande [mailto:juan.gra...@gmail.com] 
Sent: Thursday, December 15, 2011 11:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Trim and copy a solr field

Hi Swapna,

Do you want to modify the *indexed* value or the *stored* value? The
analyzers modify the indexed value. To modify the stored value, the only
option that I'm aware of is to write an UpdateProcessor that changes the
document before it's indexed.

*Juan*



On Tue, Dec 13, 2011 at 2:05 AM, Swapna Vuppala swapna.vupp...@arup.comwrote:

 Hi Juan,

 Thanks for the reply. I tried using this, but I don't see any effect of
 the analyzer/filter.

 I tried copying my Solr field to another field of the type defined below.
 Then I indexed couple of documents with the new schema, but I see that both
 fields have got the same value.
 Am looking at the indexed data in Luke.

 Am assuming that analyzers process the field value (as specified by
 various filters etc) and then store the modified value. Is that true ? What
 else could I be missing here ?

 Thanks and Regards,
 Swapna.

 -Original Message-
 From: Juan Grande [mailto:juan.gra...@gmail.com]
 Sent: Monday, December 12, 2011 11:50 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Trim and copy a solr field

 Hi Swapna,

 You could try using a copyField to a field that uses
 PatternReplaceFilterFactory:

fieldType class=solr.TextField name=path_location
  analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.PatternReplaceFilterFactory pattern=(.*)/.*
 replacement=$1/
  /analyzer
/fieldType

 The regular expression may not be exactly what you want, but it will give
 you an idea of how to do it. I'm pretty sure there must be some other ways
 of doing this, but this is the first that comes to my mind.

 *Juan*



 On Mon, Dec 12, 2011 at 4:46 AM, Swapna Vuppala swapna.vupp...@arup.com
 wrote:

  Hi,
 
  I have a Solr field that contains the absolute path of the file that is
  indexed, which will be something like
 
 file:/myserver/Folder1/SubFol1/Sub-Fol2/Test.msgfile:///\\myserver\Folder1\SubFol1\Sub-Fol2\Test.msg.
 
  Am interested in indexing the location in a separate field.  I was
 looking
  for some way to trim the field value from last occurrence of char /, so
  that I can get the location value, something like
 
 file:/myserver/Folder1/SubFol1/Sub-Fol2file:///\\myserver\Folder1\SubFol1\Sub-Fol2,
  and store it in a new field. Can you please suggest some way to achieve
  this ?
 
  Thanks and Regards,
  Swapna.
  
  Electronic mail messages entering and leaving Arup  business
  systems are scanned for acceptability of content and viruses
 



Sorting and searching on a field

2011-12-14 Thread Swapna Vuppala
Hi,

I have a field in Solr that I want to be sortable. But at the same time, I want 
to be able to search on that field without using wild cards. Is that possible ?

For example, if I have a field Subject with a value This is my first 
subject, searching in solr as subject:first should give me this result. And 
the field Subject should be sortable.
I have read about the option of copying this to a different field, using one 
for searching by tokenizing, and one for sorting. But am looking for to be able 
to do both things on the same field.

Can someone please point to a way to achieve this ?

Thanks and Regards,
Swapna.

Electronic mail messages entering and leaving Arup  business
systems are scanned for acceptability of content and viruses


RE: Trim and copy a solr field

2011-12-12 Thread Swapna Vuppala
Hi Juan,

Thanks for the reply. I tried using this, but I don't see any effect of the 
analyzer/filter.

I tried copying my Solr field to another field of the type defined below. Then 
I indexed couple of documents with the new schema, but I see that both fields 
have got the same value.
Am looking at the indexed data in Luke.

Am assuming that analyzers process the field value (as specified by various 
filters etc) and then store the modified value. Is that true ? What else could 
I be missing here ?

Thanks and Regards,
Swapna.

-Original Message-
From: Juan Grande [mailto:juan.gra...@gmail.com] 
Sent: Monday, December 12, 2011 11:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Trim and copy a solr field

Hi Swapna,

You could try using a copyField to a field that uses
PatternReplaceFilterFactory:

fieldType class=solr.TextField name=path_location
  analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.PatternReplaceFilterFactory pattern=(.*)/.*
replacement=$1/
  /analyzer
/fieldType

The regular expression may not be exactly what you want, but it will give
you an idea of how to do it. I'm pretty sure there must be some other ways
of doing this, but this is the first that comes to my mind.

*Juan*



On Mon, Dec 12, 2011 at 4:46 AM, Swapna Vuppala swapna.vupp...@arup.comwrote:

 Hi,

 I have a Solr field that contains the absolute path of the file that is
 indexed, which will be something like
 file:/myserver/Folder1/SubFol1/Sub-Fol2/Test.msgfile:///\\myserver\Folder1\SubFol1\Sub-Fol2\Test.msg.

 Am interested in indexing the location in a separate field.  I was looking
 for some way to trim the field value from last occurrence of char /, so
 that I can get the location value, something like
 file:/myserver/Folder1/SubFol1/Sub-Fol2file:///\\myserver\Folder1\SubFol1\Sub-Fol2,
 and store it in a new field. Can you please suggest some way to achieve
 this ?

 Thanks and Regards,
 Swapna.
 
 Electronic mail messages entering and leaving Arup  business
 systems are scanned for acceptability of content and viruses



Trim and copy a solr field

2011-12-11 Thread Swapna Vuppala
Hi,

I have a Solr field that contains the absolute path of the file that is 
indexed, which will be something like 
file:/myserver/Folder1/SubFol1/Sub-Fol2/Test.msgfile:///\\myserver\Folder1\SubFol1\Sub-Fol2\Test.msg.

Am interested in indexing the location in a separate field.  I was looking for 
some way to trim the field value from last occurrence of char /, so that I 
can get the location value, something like 
file:/myserver/Folder1/SubFol1/Sub-Fol2file:///\\myserver\Folder1\SubFol1\Sub-Fol2,
 and store it in a new field. Can you please suggest some way to achieve this ?

Thanks and Regards,
Swapna.

Electronic mail messages entering and leaving Arup  business
systems are scanned for acceptability of content and viruses