Hi Vigi, You can read up on the Metadata Adjuster Transformation Connector here:
https://manifoldcf.apache.org/release/release-2.1/en_US/end-user-documentation.html#metadataadjuster I've also just added the following to the documentation for it: >>>>>> <p>You can also use regular expressions in the substitution string, for example: "${there|[0-9]*}", which will extract the first sequence of sequential numbers it finds in the value of the field "there", or "${there|string(*.)|1}", which will include everything following "string" in the field value. (The third argument specifies the regular expression group number, with an optional suffix of "l" or "u" meaning upper-case or lower-case.)</p> <<<<<< Karl On Mon, Jun 8, 2015 at 7:15 AM, Karl Wright <[email protected]> wrote: > Hi Vigi, > > bq. I think the easiest would be to be able to define multiple mappings > from source metadata fields to destination metadata fields, using regular > expressions. Maybe there could be some other use cases besides regexes. > What you have right now on version 2.0.2 is very good, except that it only > allows one mapping. > > The Metadata Transformation Connector patch allows for multiple mappings, > all different, to multiple destination fields. > > bq. In fact, the most generic use case would be to be able to apply custom > transformations from the metadata fields provided by an input connector > into other output connector metadata fields. > > That is exactly what the Metadata Transformation Connector does. > > Karl > > > > On Mon, Jun 8, 2015 at 7:02 AM, Virgiliu R <[email protected]> wrote: > >> Hello Karl, >> >> I think the easiest would be to be able to define multiple mappings from >> source metadata fields to destination metadata fields, using regular >> expressions. Maybe there could be some other use cases besides regexes. >> What you have right now on version 2.0.2 is very good, except that it only >> allows one mapping. Probably this sort of transformations could be useful >> for other type of repository connections as well. >> >> In fact, the most generic use case would be to be able to apply custom >> transformations from the metadata fields provided by an input connector >> into other output connector metadata fields. >> >> It would also be very useful to know somehow which are the available >> metadata fields on the connectors. I think I have already asked you about >> some details on the Tika transformation connector. >> >> Keep in touch, >> vigi >> >> ------------------------------ >> Date: Sat, 6 Jun 2015 06:07:02 -0400 >> >> Subject: Re: Job definition metadata with multiple path attribute names >> From: [email protected] >> To: [email protected] >> >> I attached a patch to CONNECTORS-1209. I have not tested it yet. >> Hopefully there will be time to do that later in the weekend. >> >> Karl >> >> >> On Fri, Jun 5, 2015 at 10:03 AM, Karl Wright <[email protected]> wrote: >> >> Created CONNECTORS-1209 for this functionality. >> >> It's not hard to do, technically, but I need to define a language to >> describe the regex and what you would want to extract. For instance, right >> now you specify a field value in terms of another field value like this: >> >> stringstringstring${otherfieldname}stringstring >> >> I'd be putting additional specification into ${otherfieldname}, something >> like this: >> >> stringstringstring${otherfieldname:([1234567890]*)}stringstring >> >> ... which would extract the first number from the metadata value. But >> since ":" may well be part of a field name right now, I'd need to do >> something other than that, and I'd want to be able to support more complex >> regexps as well. >> >> Karl >> >> >> On Fri, Jun 5, 2015 at 9:33 AM, Karl Wright <[email protected]> wrote: >> >> Hi Vigi, >> >> I do understand your issue, but I'd propose a general solution of adding >> new functionality to the Metadata Transformer to achieve your goal. So the >> setup would be this: >> >> - Use the JCIFS connector Metadata tab to just include the entire path in >> the metadata >> - Use the Metadata Transformer to generate two different pieces of >> metadata, using a new regular expression modification feature that I would >> write for you, if we can come up with a design for it >> >> You can write your own completely new transformation connector, but >> that's no different than what I propose, and not as useful. >> >> Thanks, >> Karl >> >> >> >> On Fri, Jun 5, 2015 at 9:17 AM, Virgiliu R <[email protected]> wrote: >> >> Dear Karl, >> >> Maybe I misunderstood the applications for the metadata tab but in my >> scenario I need to extract two types of information from a document's path. >> Right now I am only able to extract one piece of information and put it in >> Solr; it would have been very useful to be able to perform other >> transformations to the paths but it's OK, I can probably write a >> transformation connector of my own. >> >> Thanks, >> vigi >> ------------------------------ >> Date: Fri, 5 Jun 2015 09:02:59 -0400 >> Subject: Re: Job definition metadata with multiple path attribute names >> From: [email protected] >> To: [email protected] >> >> >> Hi Vigi, >> >> You get, for free, the file name of the document as metadata, from all >> repository connectors, including the jcifs connector: >> >> >>>>>> >> rd.setFileName(fileNameString); >> <<<<<< >> >> The problem is that this is not something you can manipulate in MCF via >> regular expression with the current bevy of supplied transformation >> connectors, because (a) it isn't generic metadata but a fixed property of >> the document, and (b) the Metadata Transformer connector doesn't allow you >> to slice and dice metadata in any case, just compose it into bigger strings. >> >> So you're stuck with either writing a document transformation connector >> of your own, which does what you want, or proposing additional >> functionality for the Metadata Transformer. If it can be done in a >> backwards compatible way, this is something I would support. >> >> I'm not thrilled with the idea of extending the JCIFS connector to build >> multiple independent attributes all from the path; the UI for this >> connector is already quite complex, and the functionality for generically >> manipulating metadata would be useful in general anyway. >> >> Karl >> >> >> On Fri, Jun 5, 2015 at 8:37 AM, Virgiliu R <[email protected]> wrote: >> >> Hello guys, >> >> I have another Manifoldcf 2.0.2 question. Our process consists of >> indexing some documents from a Windows Share and sending them to Solr. I >> would like to extract some information from the documents and put it into >> specific Solr fields. For example, based on the id of the document I am >> currently extracting a specific folder name (using regular expressions on >> the metadata tab of the job defintition) and storing it into Solr; this it >> works fine. >> >> However, I also want to extract the file extension (using regex) and send >> it to Solr but I am not able to add more than one path attribute name on >> the Metadata tab of the job definition. I already have one that extracts a >> particular folder name from the file path and I would need a second one for >> the file extension. >> >> How would I be able to achieve this? >> >> Regards, >> vigi >> >> >> >> >> >> >
