Hello Karl,

I think the easiest would be to be able to define multiple mappings from source 
metadata fields to destination metadata fields, using regular expressions. 
Maybe there could be some other use cases besides regexes. What you have right 
now on version 2.0.2 is very good, except that it only allows one mapping. 
Probably this sort of transformations could be useful for other type of 
repository connections as well.

In fact, the most generic use case would be to be able to apply custom 
transformations from the metadata fields provided by an input connector into 
other output connector metadata fields.

It would also be very useful to know somehow which are the available metadata 
fields on the connectors. I think I have already asked you about some details 
on the Tika transformation connector.

Keep in touch,
vigi

Date: Sat, 6 Jun 2015 06:07:02 -0400
Subject: Re: Job definition metadata with multiple path attribute names
From: [email protected]
To: [email protected]

I attached a patch to CONNECTORS-1209.  I have not tested it yet.  Hopefully 
there will be time to do that later in the weekend.

Karl


On Fri, Jun 5, 2015 at 10:03 AM, Karl Wright <[email protected]> wrote:
Created CONNECTORS-1209 for this functionality.

It's not hard to do, technically, but I need to define a language to describe 
the regex and what you would want to extract.  For instance, right now you 
specify a field value in terms of another field value like this:

stringstringstring${otherfieldname}stringstring

I'd be putting additional specification into ${otherfieldname}, something like 
this:

stringstringstring${otherfieldname:([1234567890]*)}stringstring

... which would extract the first number from the metadata value.  But since 
":" may well be part of a field name right now, I'd need to do something other 
than that, and I'd want to be able to support more complex regexps as well.

Karl


On Fri, Jun 5, 2015 at 9:33 AM, Karl Wright <[email protected]> wrote:
Hi Vigi,
I do understand your issue, but I'd propose a general solution of adding new 
functionality to the Metadata Transformer to achieve your goal.  So the setup 
would be this:
- Use the JCIFS connector Metadata tab to just include the entire path in the 
metadata- Use the Metadata Transformer to generate two different pieces of 
metadata, using a new regular expression modification feature that I would 
write for you, if we can come up with a design for it
You can write your own completely new transformation connector, but that's no 
different than what I propose, and not as useful.
Thanks,Karl


On Fri, Jun 5, 2015 at 9:17 AM, Virgiliu R <[email protected]> wrote:



Dear Karl,

Maybe I misunderstood the applications for the metadata tab but in my scenario 
I need to extract two types of information from a document's path. Right now I 
am only able to extract one piece of information and put it in Solr; it would 
have been very useful to be able to perform other transformations to the paths 
but it's OK, I can probably write a transformation connector of my own.

Thanks,
vigi
Date: Fri, 5 Jun 2015 09:02:59 -0400
Subject: Re: Job definition metadata with multiple path attribute names
From: [email protected]
To: [email protected]

Hi Vigi,

You get, for free, the file name of the document as metadata, from all 
repository connectors, including the jcifs connector:

>>>>>>
                  rd.setFileName(fileNameString);
<<<<<<

The problem is that this is not something you can manipulate in MCF via regular 
expression with the current bevy of supplied transformation connectors, because 
(a) it isn't generic metadata but a fixed property of the document, and (b) the 
Metadata Transformer connector doesn't allow you to slice and dice metadata in 
any case, just compose it into bigger strings.

So you're stuck with either writing a document transformation connector of your 
own, which does what you want, or proposing additional functionality for the 
Metadata Transformer.  If it can be done in a backwards compatible way, this is 
something I would support.

I'm not thrilled with the idea of extending the JCIFS connector to build 
multiple independent attributes all from the path; the UI for this connector is 
already quite complex, and the functionality for generically manipulating 
metadata would be useful in general anyway.

Karl


On Fri, Jun 5, 2015 at 8:37 AM, Virgiliu R <[email protected]> wrote:



Hello guys,

I have another Manifoldcf 2.0.2 question. Our process consists of indexing some 
documents from a Windows Share and sending them to Solr. I would like to 
extract some information from the documents and put it into specific Solr 
fields. For example, based on the id of the document I am currently extracting 
a specific folder name (using regular expressions on the metadata tab of the 
job defintition) and storing it into Solr; this it works fine. 

However, I also want to extract the file extension (using regex) and send it to 
Solr but I am not able to add more than one path attribute name on the Metadata 
tab of the job definition. I already have one that extracts a particular folder 
name from the file path and I would need a second one for the file extension.

How would I be able to achieve this?

Regards,
vigi
                                          

                                          





                                          

Reply via email to