R: Populate field Solr

Bisonti Mario Fri, 29 Aug 2014 05:24:09 -0700

Hi Karl.
In the tab “Solr Field Mapping” there aren’t field mapping and the “Keep all 
metadata” is unchecked


I see in the output connection “Solr” that in tab Schema:
Use the Extract Update Handler : Checked

Is this right?

Did you use Tika on pdf file for your testing?




Mario







Da: Karl Wright [mailto:[email protected]]
Inviato: venerdì 29 agosto 2014 13:46
A: Karl Wright; Bisonti Mario; [email protected]
Oggetto: RE: Populate field Solr

Hi Mario,

I tried this here on 1.7 and it worked as expected.

Please look at your solr field mapping tab.  There is a checkbox there which 
suppresses all unmapped fields.  How is this set for you?

Karl

Sent from my Windows Phone
________________________________
From: Karl Wright
Sent: 8/29/2014 6:37 AM
To: Bisonti Mario; [email protected]<mailto:[email protected]>
Subject: RE: Populate field Solr
Hi Mario,

The reason I wanted the view job output is because there are multiple ways you 
can do forced metadata with a we connection.  There's a Forced Metadata tab, a 
Metadata tab, and you can add a Metadata Transformer to the pipeline as well.

I will have a look at why Forced Metadata is no longer working, but I suggest 
that you try the other two possibilities while I do that.

Thanks,

Karl

Sent from my Windows Phone
________________________________
From: Bisonti Mario
Sent: 8/29/2014 2:50 AM
To: [email protected]<mailto:[email protected]>
Subject: R: Populate field Solr
Ok, thanks.

Tab Name
Name:ScanPdftatankamNEW

Tab Connection
Stage    Type      Precedent          Description        Connection name
1.            Repository                                                        
ConnessioneWeb
2.            Output 1.                                                         
  Solr

Tab Forced metadata
Parameter name:category
Parameter value: manuale

Tab Seeds
http://tatankam.herobo.com/prova/sotto/

Tab Inclusions
Include in crawl:
.*sotto*
Include in index:
.*sotto*


Tab Security, Metadata, Solr Field Mapping
Empty


I omit the scheduled Tab because I start it manually.
I am using ManifoldCF 1.7

Thanks a lot for your support

Mario





Da: Karl Wright [mailto:[email protected]<mailto:[email protected]>]
Inviato: giovedì 28 agosto 2014 17:54
A: [email protected]<mailto:[email protected]>
Oggetto: Re: Populate field Solr

Hi Mario,
No metadata whatsoever is getting through to Solr.
Can you cut/paste the data on the view page of your job please?  View your job, 
and then select the output so I can see how everything is configured.

Karl

On Thu, Aug 28, 2014 at 11:30 AM, Bisonti Mario 
<[email protected]<mailto:[email protected]>> wrote:
INFO  - 2014-08-28 17:26:47.372; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr 
path=/update/extract 
params={commitWithin=10000&literal.id<http://literal.id>=http://tatankam.herobo.com/prova/sotto/&resource.name=index.html&wt=xml&version=2.2}
 {add=[http://tatankam.herobo.com/prova/sotto/ (1477694830537605120)]} 0 5
INFO  - 2014-08-28 17:26:48.976; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr 
path=/update/extract 
params={commitWithin=10000&literal.id<http://literal.id>=http://tatankam.herobo.com/prova/sotto/Using%2520the%2520various%2520optional%2520Film%2520Adapters.pdf&resource.name=Using%2520the%2520various%2520optional%2520Film%2520Adapters.pdf&wt=xml&version=2.2}
 
{add=[http://tatankam.herobo.com/prova/sotto/Using%20the%20various%20optional%20Film%20Adapters.pdf
 (1477694832220569600)]} 0 4
INFO  - 2014-08-28 17:26:51.409; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr 
path=/update/extract 
params={commitWithin=10000&literal.id<http://literal.id>=http://tatankam.herobo.com/prova/sotto/DopoFullCrawl.pdf&resource.name=DopoFullCrawl.pdf&wt=xml&version=2.2}
 {add=[http://tatankam.herobo.com/prova/sotto/DopoFullCrawl.pdf 
(1477694834770706432)]} 0 67
INFO  - 2014-08-28 17:26:51.747; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr 
path=/update/extract 
params={commitWithin=10000&literal.id<http://literal.id>=http://tatankam.herobo.com/prova/sotto/SAP%2520SSO%2520Authentication%2520with%2520verify.pdf&resource.name=SAP%2520SSO%2520Authentication%2520with%2520verify.pdf&wt=xml&version=2.2}
 
{add=[http://tatankam.herobo.com/prova/sotto/SAP%20SSO%20Authentication%20with%20verify.pdf
 (1477694835126173696)]} 0 58
INFO  - 2014-08-28 17:26:57.372; org.apache.solr.update.DirectUpdateHandler2; 
start 
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
INFO  - 2014-08-28 17:26:57.377; org.apache.solr.search.SolrIndexSearcher; 
Opening Searcher@45d1f61c[collection1] main
INFO  - 2014-08-28 17:26:57.377; org.apache.solr.core.QuerySenderListener; 
QuerySenderListener sending requests to Searcher@45d1f61c[collection1] 
main{StandardDirectoryReader(segments_alc:42455:nrt _ex1(4.9):C4)}
INFO  - 2014-08-28 17:26:57.378; org.apache.solr.core.QuerySenderListener; 
QuerySenderListener done.
INFO  - 2014-08-28 17:26:57.378; org.apache.solr.core.SolrCore; [collection1] 
Registered new searcher Searcher@45d1f61c[collection1] 
main{StandardDirectoryReader(segments_alc:42455:nrt _ex1(4.9):C4)}
INFO  - 2014-08-28 17:26:57.378; org.apache.solr.update.DirectUpdateHandler2; 
end_commit_flush
INFO  - 2014-08-28 17:27:01.329; org.apache.solr.update.DirectUpdateHandler2; 
start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
INFO  - 2014-08-28 17:27:01.344; org.apache.solr.core.SolrDeletionPolicy; 
SolrDeletionPolicy.onCommit: commits: num=2
                
commit{dir=NRTCachingDirectory(MMapDirectory@/usr/share/solr/example/solr/collection1/data/index
 
lockFactory=NativeFSLockFactory@/usr/share/solr/example/solr/collection1/data/index;
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_alc,generation=13728}
                
commit{dir=NRTCachingDirectory(MMapDirectory@/usr/share/solr/example/solr/collection1/data/index
 
lockFactory=NativeFSLockFactory@/usr/share/solr/example/solr/collection1/data/index;
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_ald,generation=13729}
INFO  - 2014-08-28 17:27:01.344; org.apache.solr.core.SolrDeletionPolicy; 
newest commit generation = 13729
INFO  - 2014-08-28 17:27:01.345; org.apache.solr.core.SolrCore; 
SolrIndexSearcher has not changed - not re-opening: 
org.apache.solr.search.SolrIndexSearcher
INFO  - 2014-08-28 17:27:01.346; org.apache.solr.update.DirectUpdateHandler2; 
end_commit_flush
INFO  - 2014-08-28 17:27:01.346; 
org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr 
path=/update/extract params={commit=true&wt=xml&version=2.2} {commit=} 0 17






Da: Karl Wright [mailto:[email protected]<mailto:[email protected]>]
Inviato: giovedì 28 agosto 2014 17:21
A: [email protected]<mailto:[email protected]>
Oggetto: Re: Populate field Solr

Hi Mario,
Can you post the Solr log INFO message for the indexing of the document in 
question?
Thanks,
Karl

On Thu, Aug 28, 2014 at 11:18 AM, Bisonti Mario 
<[email protected]<mailto:[email protected]>> wrote:
Hallo.

I have web repository containing pdf files.

So from Manifold I scan that directory and index the output connector : solr

I need to populate the field “category” of solr index.

I tried to use a job on ManifoldCF to do this.
Tab: Forced Metadata
Parameter name: category
Parameter value: manuale

Buti t doesn’t work.

So I don’t understand if the problemi s Tika that executes the scan of pdf 
documents and passes to Solr field the values not using the parameter name : 
category

Colud you help me?

Thanks  a lot

R: Populate field Solr

Reply via email to