Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by FergusMcMenemie: http://wiki.apache.org/solr/DataImportHandler The comment on the change is: Add a description of the new LineEntityProcessor ------------------------------------------------------------------------------ A simple entity processor which can be used to enumerate the list of files from a File System based on some criteria. It does not use a !DataSource. The entity attributes are: * '''`fileName`''' :(required) A regex pattern to identify files * '''`baseDir`''' : (required) The Base directory (absolute path) - * '''`recursive`''' : Recursive listing or not.default is 'false ' + * '''`recursive`''' : Recursive listing or not. Default is 'false' * '''`excludes`''' : A Regex pattern of excluded file names * '''`newerThan`''' : A date param . Use the format (`yyyy-MM-dd HH:mm:ss`) . It can also be a datemath string eg: ('NOW-3DAYS'). The single quote is necessary . Or it can be a valid variableresolver format like (${var.name}) * '''`olderThan`''' : A date param . Same rules as above @@ -796, +796 @@ [[Anchor(plaintext)]] <!> ["Solr1.4"] - This works mostly like an X!PathEntityProcessor. The only difference is that it does not parse the content. It just gives out the whole content as one big String . It produces one implicit field called 'plainText' . + This !EntityProcessor reads all content from the data source into an single implicit field called 'plainText'. The content is not parsed in any way, however you may add transformers to manipulate the data within 'plainText' as needed or to create other additional fields. example: {{{ @@ -806, +806 @@ <entity> }}} + === LineEntityProcessor === + [[Anchor(LineEntityProcessor)]] + <!> ["Solr1.4"] + + This !EntityProcessor reads all content from the data source on a line by line basis, a field called 'rawLine' is returned for each line read. The content is not parsed in any way, however you may add transformers to manipulate the data within 'rawLine' or to create other additional fields. + + The lines read can be filtered by two regular expressions '''acceptLineRegex''' and '''omitLineRegex'''. + This entities additional attributes are: + * '''`url`''' : a required attribute that specifies the location of the input file in a way that is compatible with the configured datasource. If this value is relative and you are using !FileDataSource or URL!DataSource, it assumed to be relative to '''baseLoc'''. + * '''`acceptLineRegex`''' :an optional attribute that if present discards any line which does not match the regExp. + * '''`omitLineRegex`''' : an optional attribute that is applied after any acceptLineRegex and discards any line which matches this regExp. + example: + {{{ + <entity name="jc" + processor="LineEntityProcessor" + acceptLineRegex="^.*\.xml$" + omitLineRegex="/obsolete" + url="file:///Volumes/ts/files.lis" + rootEntity="false" + dataSource="myURIreader1" + transformer="RegexTransformer,DateFormatTransformer" + > + ... + }}} + While there are use cases where you might need to create a solr document per line read from a file, it is expected that in most cases that the lines read will consist of a pathname which is in turn consumed by another !EntityProcessor + such as X!PathEntityProcessor. == DataSource == [[Anchor(datasource)]]
