Hi Vigi, I do understand your issue, but I'd propose a general solution of adding new functionality to the Metadata Transformer to achieve your goal. So the setup would be this:
- Use the JCIFS connector Metadata tab to just include the entire path in the metadata - Use the Metadata Transformer to generate two different pieces of metadata, using a new regular expression modification feature that I would write for you, if we can come up with a design for it You can write your own completely new transformation connector, but that's no different than what I propose, and not as useful. Thanks, Karl On Fri, Jun 5, 2015 at 9:17 AM, Virgiliu R <[email protected]> wrote: > Dear Karl, > > Maybe I misunderstood the applications for the metadata tab but in my > scenario I need to extract two types of information from a document's path. > Right now I am only able to extract one piece of information and put it in > Solr; it would have been very useful to be able to perform other > transformations to the paths but it's OK, I can probably write a > transformation connector of my own. > > Thanks, > vigi > ------------------------------ > Date: Fri, 5 Jun 2015 09:02:59 -0400 > Subject: Re: Job definition metadata with multiple path attribute names > From: [email protected] > To: [email protected] > > > Hi Vigi, > > You get, for free, the file name of the document as metadata, from all > repository connectors, including the jcifs connector: > > >>>>>> > rd.setFileName(fileNameString); > <<<<<< > > The problem is that this is not something you can manipulate in MCF via > regular expression with the current bevy of supplied transformation > connectors, because (a) it isn't generic metadata but a fixed property of > the document, and (b) the Metadata Transformer connector doesn't allow you > to slice and dice metadata in any case, just compose it into bigger strings. > > So you're stuck with either writing a document transformation connector of > your own, which does what you want, or proposing additional functionality > for the Metadata Transformer. If it can be done in a backwards compatible > way, this is something I would support. > > I'm not thrilled with the idea of extending the JCIFS connector to build > multiple independent attributes all from the path; the UI for this > connector is already quite complex, and the functionality for generically > manipulating metadata would be useful in general anyway. > > Karl > > > On Fri, Jun 5, 2015 at 8:37 AM, Virgiliu R <[email protected]> wrote: > > Hello guys, > > I have another Manifoldcf 2.0.2 question. Our process consists of indexing > some documents from a Windows Share and sending them to Solr. I would like > to extract some information from the documents and put it into specific > Solr fields. For example, based on the id of the document I am currently > extracting a specific folder name (using regular expressions on the > metadata tab of the job defintition) and storing it into Solr; this it > works fine. > > However, I also want to extract the file extension (using regex) and send > it to Solr but I am not able to add more than one path attribute name on > the Metadata tab of the job definition. I already have one that extracts a > particular folder name from the file path and I would need a second one for > the file extension. > > How would I be able to achieve this? > > Regards, > vigi > > >
