Ah, never mind -- I need you instead to view the Solr connection, and paste
that in an email.  Basically, I want to be sure you are not inadvertantly
disabling metadata to Solr.

Thanks,
Karl

On Wed, Feb 22, 2017 at 10:39 AM, Karl Wright <[email protected]> wrote:

> This is how the email appears here:
>
> >>>>>>
>
> 4.
>
> Bottom of Form
>
>
>
>     Marisol Redondo
>
>     Email: [email protected]
>
>     Phone: 35428
>
>
>
> Please note that Revenue cannot guarantee that any personal and sensitive 
> data, sent in plain text via standard email, is fully secure. Customers who 
> choose to use this channel are deemed to have accepted any risk involved. The 
> alternative communication methods offered by Revenue include standard post 
> and the option to use our (encrypted) MyEnquiries service which is available 
> within myAccount and ROS. You can register for either myAccount or ROS on the 
> Revenue website.
>
>
>
> Tabhair faoi deara nach féidir leis na Coimisinéirí Ioncaim ráthaíocht a 
> thabhairt go bhfuil aon sonraí pearsanta agus íogair a gcuirtear isteach i 
> ngnáth-théacs trí r-phost caighdeánach go huile is go hiomlán slán. Meastar 
> go nglacann custaiméirí a úsáideann an cainéal seo le haon riosca bainteach. 
> I measc na modhanna cumarsáide eile atá ag na Coimisinéirí ná post 
> caighdeánach agus an rogha ár seirbhís (criptithe) M'Fhiosruithe a úsáid, tá 
> sí ar fáil laistigh de MoChúrsaí agus ROS. Is féidir leat clárú le haghaidh 
> ceachtar MoChúrsaí nó ROS ar shuíomh gréasáin na gCoimisinéirí.
>
> <<<<<<
>
> In other words I cannot see anything from the 4. stage.
>
>
> Thanks,
>
> Karl
>
>
> On Wed, Feb 22, 2017 at 10:37 AM, Marisol Redondo <
> [email protected]> wrote:
>
>> I was trying with "Keep all incoming metadata" set to false and too true,
>> but I'll take your advice and set to true.
>>
>> I don't know why you can't see it, but it's the 4 stage
>>
>> On 22 February 2017 at 15:26, Karl Wright <[email protected]> wrote:
>>
>>> Hi Marisol,
>>>
>>> Some observations.
>>> (1) It makes no sense to have "Keep all incoming metadata" set to false,
>>> since that will filter out everything that your tika extractor extracts.  I
>>> doubt that is what you have intended.
>>> (2) I can't see the Solr output configuration -- looks like it got
>>> truncated?
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>> On Wed, Feb 22, 2017 at 10:12 AM, Marisol Redondo <
>>> [email protected]> wrote:
>>>
>>>> Here you are:
>>>>
>>>> View a Job
>>>>
>>>> Top of Form
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> Name:
>>>>
>>>> revenueToSites
>>>> ------------------------------
>>>>
>>>> Pipeline:
>>>>
>>>> Stage
>>>>
>>>> Type
>>>>
>>>> Precedent
>>>>
>>>> Description
>>>>
>>>> Connection name
>>>>
>>>> 1.
>>>>
>>>> Repository
>>>>
>>>> Revenue Website
>>>>
>>>> 2.
>>>>
>>>> Transformation
>>>>
>>>> 1.
>>>>
>>>> Tikka Metadata Extractor
>>>>
>>>> 3.
>>>>
>>>> Transformation
>>>>
>>>> 2.
>>>>
>>>> Set mimeType and facetContentType
>>>>
>>>> customField
>>>>
>>>> 4.
>>>>
>>>> Output
>>>>
>>>> 3.
>>>>
>>>> sites solr dev
>>>>
>>>> Notifications:
>>>>
>>>> Stage
>>>>
>>>> Description
>>>>
>>>> Connection name
>>>>
>>>> No notification connections
>>>> ------------------------------
>>>>
>>>> Priority:
>>>>
>>>> 5
>>>>
>>>> Start method:
>>>>
>>>> Don't automatically start
>>>> ------------------------------
>>>>
>>>> Schedule type:
>>>>
>>>> Scan every document once
>>>>
>>>> Minimum recrawl interval:
>>>>
>>>> Not applicable
>>>>
>>>> Maximum recrawl interval:
>>>>
>>>> Not applicable
>>>>
>>>> Expiration interval:
>>>>
>>>> Not applicable
>>>>
>>>> Reseed interval:
>>>>
>>>> Not applicable
>>>> ------------------------------
>>>>
>>>> No scheduled run times
>>>> ------------------------------
>>>>
>>>> Maximum hop count for link type 'link':
>>>>
>>>> Unlimited
>>>>
>>>> Maximum hop count for link type 'redirect':
>>>>
>>>> Unlimited
>>>> ------------------------------
>>>>
>>>> Hop count mode:
>>>>
>>>> Delete unreachable documents
>>>> ------------------------------
>>>>
>>>> 1.
>>>>
>>>> Seeds:
>>>>
>>>> https://xxxxxx/index.aspx
>>>> <https://preview.revenuedomain.ie/en/press-office/index.aspx>
>>>> ------------------------------
>>>>
>>>> No canonicalization specified - all URLs will be reordered and have all
>>>> sessions removed
>>>> ------------------------------
>>>>
>>>> No mappings specified; will accept all URLs
>>>> ------------------------------
>>>>
>>>> Include only hosts matching seeds?
>>>>
>>>> yes
>>>> ------------------------------
>>>>
>>>> Include in crawl:
>>>>
>>>> .*
>>>> ------------------------------
>>>>
>>>> Include in index:
>>>>
>>>> .*
>>>> ------------------------------
>>>>
>>>> Exclude from crawl:
>>>>
>>>> \.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|CSS|sit|SIT|eps|EPS|w
>>>> mf|WMF|zip|ZIP|mpg|MPG|gz|GZ|rpm|RPM|tgz|TGZ|mov|MOV|exe|EXE
>>>> |jpeg|JPEG|bmp|BMP|js|JS|<script>|</script>|<script
>>>> type="text/javascript">)
>>>> [?*!@=].*
>>>> ------------------------------
>>>>
>>>> Exclude from index:
>>>> ------------------------------
>>>>
>>>> Exclude content from index:
>>>> ------------------------------
>>>>
>>>> No access tokens specified
>>>> ------------------------------
>>>>
>>>> Excluded headers:
>>>>
>>>> last-modified
>>>> ------------------------------
>>>>
>>>> 2.
>>>>
>>>> Field mappings:
>>>>
>>>> Metadata field name
>>>>
>>>> Final field name
>>>>
>>>> No field mapping specified
>>>> ------------------------------
>>>>
>>>> Keep all metadata:
>>>>
>>>> true
>>>> ------------------------------
>>>>
>>>> Lower names:
>>>>
>>>> false
>>>> ------------------------------
>>>>
>>>> Write limit:
>>>> ------------------------------
>>>>
>>>> Ignore Tika exceptions:
>>>>
>>>> true
>>>> ------------------------------
>>>>
>>>> Boilerplate extractor:
>>>>
>>>> -- No extraction selected --
>>>> ------------------------------
>>>>
>>>> 3.
>>>>
>>>> Metadata expressions:
>>>>
>>>> Parameter name
>>>>
>>>> Remove this parameter?
>>>>
>>>> Expression ("${fieldname}" references a field)
>>>>
>>>> facetContentType
>>>>
>>>> false
>>>>
>>>> site.ie
>>>> ------------------------------
>>>>
>>>> Keep all incoming metadata
>>>>
>>>> false
>>>>
>>>> Remove empty metadata values
>>>>
>>>> false
>>>> ------------------------------
>>>>
>>>> 4.
>>>>
>>>> Bottom of Form
>>>>
>>>>
>>>>
>>>>     Marisol Redondo
>>>>
>>>>     Email: [email protected]
>>>>
>>>>     Phone: 35428
>>>>
>>>>
>>>>
>>>> Please note that Revenue cannot guarantee that any personal and sensitive 
>>>> data, sent in plain text via standard email, is fully secure. Customers 
>>>> who choose to use this channel are deemed to have accepted any risk 
>>>> involved. The alternative communication methods offered by Revenue include 
>>>> standard post and the option to use our (encrypted) MyEnquiries service 
>>>> which is available within myAccount and ROS. You can register for either 
>>>> myAccount or ROS on the Revenue website.
>>>>
>>>>
>>>>
>>>> Tabhair faoi deara nach féidir leis na Coimisinéirí Ioncaim ráthaíocht a 
>>>> thabhairt go bhfuil aon sonraí pearsanta agus íogair a gcuirtear isteach i 
>>>> ngnáth-théacs trí r-phost caighdeánach go huile is go hiomlán slán. 
>>>> Meastar go nglacann custaiméirí a úsáideann an cainéal seo le haon riosca 
>>>> bainteach. I measc na modhanna cumarsáide eile atá ag na Coimisinéirí ná 
>>>> post caighdeánach agus an rogha ár seirbhís (criptithe) M'Fhiosruithe a 
>>>> úsáid, tá sí ar fáil laistigh de MoChúrsaí agus ROS. Is féidir leat clárú 
>>>> le haghaidh ceachtar MoChúrsaí nó ROS ar shuíomh gréasáin na gCoimisinéirí.
>>>>
>>>>
>>>>
>>>> On 22 February 2017 at 14:53, Karl Wright <[email protected]> wrote:
>>>>
>>>>> Hi Marisol,
>>>>>
>>>>> The [INFO] log entries indicate that your document has almost no
>>>>> metadata at all.  But the Metadata Adjuster transformation connector is
>>>>> designed to do exactly what you want.
>>>>>
>>>>> Can you view your job, and cut and paste the View Job page into an
>>>>> email, so I can see how your metadata adjuster transformation connection
>>>>> and your solr output connections are configured?  Thanks!
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 22, 2017 at 8:57 AM, Marisol Redondo <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi  Karl and thank you for this quick answer.
>>>>>>
>>>>>> I was reading the documentation of MCF 1.10 but I'm using MCF 2.5,
>>>>>> sorry for the confusion, and I think this version is compatible with 
>>>>>> solr6.
>>>>>> The pdf doesn't have any metadata or field called facetContentType,
>>>>>> this is because I'd been trying to use the Metadata Adjuster, to add a 
>>>>>> new
>>>>>> metadata/property to the doc so solr can index by this field when I'm
>>>>>> injecting the doc.
>>>>>> Should I use other transformation or is there any other way of duing
>>>>>> it?
>>>>>> I am migrating from nutch to ManifoldCF and in nutch we can do it
>>>>>> with plugins, and I was thinking that the plugins in nutch are the same 
>>>>>> as
>>>>>> the transformation connectors in MCF
>>>>>>
>>>>>> The completed error in solr is :
>>>>>>
>>>>>> 017-02-21 13:19:32.108 INFO  (qtp1854778591-18) [   x:sites]
>>>>>>> o.a.s.c.PluginBag Going to create a new requestHandler with {type =
>>>>>>> requestHandler,name = /update/extract,class = 
>>>>>>> solr.extraction.ExtractingRequestHandler,args
>>>>>>> = {defaults={lowernames=true,fmap.meta=ignored_,fmap.content=_
>>>>>>> text_,update.chain=add-unknown-fields-to-the-schema,df=_text_}}}
>>>>>>
>>>>>> 2017-02-21 13:19:32.454 INFO  (qtp1854778591-18) [   x:sites]
>>>>>>> o.a.s.u.p.LogUpdateProcessorFactory [sites]  webapp=/solr path=/up
>>>>>>
>>>>>> date/extract params={resource.name=introduction.pdf&literal.id
>>>>>>> =https://...../introduction.pdf&wt=xml&version=2.2}{} 0 347
>>>>>>
>>>>>> 2017-02-21 13:19:32.455 ERROR (qtp1854778591-18) [   x:sites]
>>>>>>> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: [
>>>>>>
>>>>>> doc=https://....../introduction.pdf] missing required field:
>>>>>>> facetContentType
>>>>>>
>>>>>>         at org.apache.solr.update.Documen
>>>>>>> tBuilder.toDocument(DocumentBuilder.java:197)
>>>>>>
>>>>>>         at org.apache.solr.update.AddUpda
>>>>>>> teCommand.getLuceneDocument(AddUpdateCommand.java:82)
>>>>>>
>>>>>>         at org.apache.solr.update.DirectU
>>>>>>> pdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:277)
>>>>>>
>>>>>>         at org.apache.solr.update.DirectU
>>>>>>> pdateHandler2.addDoc0(DirectUpdateHandler2.java:211)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> On 21 February 2017 at 14:52, Karl Wright <[email protected]> wrote:
>>>>>>
>>>>>>> Hi Marisol,
>>>>>>>
>>>>>>> Can you find the [INFO] entry in the Solr log for this document?
>>>>>>> That should help clear up any confusion.
>>>>>>>
>>>>>>> Also, for what it is worth, MCF 1.10 is not using a SolrJ that is up
>>>>>>> to date with Solr 6.x.  That could be the source of the problem  Is 
>>>>>>> there
>>>>>>> any reason you are using a 1.x version of MCF?
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 21, 2017 at 8:42 AM, Marisol Redondo <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi.
>>>>>>>>
>>>>>>>> I'm trying to use metadata adjuster to add one field to the solr
>>>>>>>> index but doesn't inject the field into a solr's field.
>>>>>>>> Maybe I'm misundertaning the use of the metada adjuster, but I have
>>>>>>>> read in the documentation (https://manifoldcf.apache.org
>>>>>>>> /release/release-1.10/en_US/end-user-documentation.html) that I
>>>>>>>> can add metadata to the document that is going to be indexed into 
>>>>>>>> solr, but
>>>>>>>> the solr instance gave me the error "missing required field:
>>>>>>>> facetContentType".
>>>>>>>>
>>>>>>>> ManifoldCF Job pipeline:
>>>>>>>> 1. Repository (type web repository)
>>>>>>>> 2. Transformation (Tikka Metadata Extractor)
>>>>>>>> 3. Transformation (type Metada Adjuster)
>>>>>>>> 4. Output (Solr 6)
>>>>>>>>
>>>>>>>> ManifoldCF Job Metadata Expressions tab:
>>>>>>>>   Parameter name: "facetContentType"
>>>>>>>>   Remove this parameter: false
>>>>>>>>   Expresion: xxxx  (the literal text value I want in
>>>>>>>> facetContentType)
>>>>>>>>
>>>>>>>> Solr schema:
>>>>>>>>   .....
>>>>>>>>   <field name="facetContentType" type="string" indexed="true"
>>>>>>>> stored="true" required="true"/>
>>>>>>>>  ....
>>>>>>>>
>>>>>>>> The error logged in ManifoldCF is:
>>>>>>>>       Error from server at http://solrServer:port/solr/c
>>>>>>>> <http://revnetsolrdev:8983/solr/sites>ore: [doc=https://
>>>>>>>> ....../index.aspx] missing required field: facetContentType.
>>>>>>>>
>>>>>>>> Thanks for your help
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to