ElasticSearch 0.90
MIME-Version: 1.0
Content-Type: multipart/alternative;
        boundary="_000_6355997B50A79B48B60F55953D30E3B6019370A442B2EXWESTtella_"

--_000_6355997B50A79B48B60F55953D30E3B6019370A442B2EXWESTtella_
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit

At one point there was a mapping; I am not sure what happened to it but
I will look at first opportunity.

Karl

Sent from my Windows Phone
From: Nichols, Richard
Sent: 5/20/2013 12:37 PM
To: [email protected]
Subject: Attachment processing with ElasticSearch Connector to
ElasticSearch 0.90
Hi,

I'm using ManifoldCF 1.2 with ElasticSearch 0.90.  I'm trying to index
PDF files via the "Windows Shares" repository connector.  I have the
elasticsearch-mapper-attachments plugin installed in ElasticSearch.

When I run the job on an empty index, a 'flat' schema is created:
{
  "pdf_docs_flat_schema" : {
    "pdf_docs" : {
      "properties" : {
        "_content_type" : {
          "type" : "string"
        },
        "_name" : {
          "type" : "string"
        },
        "allow_token_document" : {
          "type" : "string"
        },
        "allow_token_share" : {
          "type" : "string"
        },
        "deny_token_document" : {
          "type" : "string"
        },
        "deny_token_share" : {
          "type" : "string"
        },
        "file" : {
          "type" : "string"
        },
        "lastModified" : {
          "type" : "string"
        },
        "type" : {
          "type" : "string"
        }
      }
    }
  }
}

Notice that the _content_type, _name, file, and type fields are all
properties of type "string".  As far as I can tell the 'type' of
"attachment" sent with indexed file is just treated as a normal piece
of metadata and the 'file' field (which is snet as a base64 encoded
string) is never processed as an attachment.

According to 
http://www.elasticsearch.org/guide/reference/mapping/attachment-type/
it seems that the connector should use a mapping command to set the
'file' property with a type of 'attachment', with "_content_type" and
"_name" fields as subfields of the 'file' property.  Also, through
testing I found that if you want the 'date', 'title', 'author', and
'keywords' fields extracted from the document and saved, they need to
be listed in the mapping too.   (Unfortunately, using a mapping
changes the JSON code for adding the document to the index.  Instead
of sending the base64 encoded file attached to the 'file' field, it's
attached to the 'contents' subfield.)

Am I missing something obvious here?  All I want is my documents
properly indexed.
Is this something for the 'dev' mailing list instead?

Thanks,
Rick


============================================================
The information contained in this message may be privileged
and confidential and protected from disclosure. If the reader
of this message is not the intended recipient, or an employee
or agent responsible for delivering this message to the
intended recipient, you are hereby notified that any reproduction,
dissemination or distribution of this communication is strictly
prohibited. If you have received this communication in error,
please notify us immediately by replying to the message and
deleting it from your computer. Thank you. Tellabs
============================================================

--_000_6355997B50A79B48B60F55953D30E3B6019370A442B2EXWESTtella_
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

<html><head><meta content=3D"text/html; charset=3Dutf-8" http-equiv=3D"Cont=
ent-Type"></head><body><div><div style=3D"font-family: Calibri,sans-serif; =
font-size: 11pt;">At one point there was a mapping; I am not sure what happ=
ened to it but I will look at first opportunity.<br><br>Karl<br><br>Sent fr=
om my Windows Phone<br></div></div><hr><span style=3D"font-family: Tahoma,s=
ans-serif; font-size: 10pt; font-weight: bold;">From: </span><span style=3D=
"font-family: Tahoma,sans-serif; font-size: 10pt;">Nichols, Richard</span><=
br><span style=3D"font-family: Tahoma,sans-serif; font-size: 10pt; font-wei=
ght: bold;">Sent: </span><span style=3D"font-family: Tahoma,sans-serif; fon=
t-size: 10pt;">5/20/2013 12:37 PM</span><br><span style=3D"font-family: Tah=
oma,sans-serif; font-size: 10pt; font-weight: bold;">To: </span><span style=
=3D"font-family: Tahoma,sans-serif; font-size: 10pt;">[email protected]=
e.org</span><br><span style=3D"font-family: Tahoma,sans-serif; font-size: 1=
0pt; font-weight: bold;">Subject: </span><span style=3D"font-family: Tahoma=
,sans-serif; font-size: 10pt;">Attachment processing with ElasticSearch Con=
nector to ElasticSearch 0.90</span><br><br></body></html><html xmlns:v=3D"u=
rn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-microsoft-com:office:o=
ffice" xmlns:w=3D"urn:schemas-microsoft-com:office:word" xmlns:m=3D"http://=
schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http://www.w3.org/TR/RE=
C-html40">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
>
<meta name=3D"Generator" content=3D"Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
=09{font-family:Calibri;
=09panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
=09{font-family:Tahoma;
=09panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
=09{margin:0in;
=09margin-bottom:.0001pt;
=09font-size:11.0pt;
=09font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
=09{mso-style-priority:99;
=09color:blue;
=09text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
=09{mso-style-priority:99;
=09color:purple;
=09text-decoration:underline;}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
=09{mso-style-priority:99;
=09mso-style-link:"Balloon Text Char";
=09margin:0in;
=09margin-bottom:.0001pt;
=09font-size:8.0pt;
=09font-family:"Tahoma","sans-serif";}
span.EmailStyle17
=09{mso-style-type:personal-compose;
=09font-family:"Calibri","sans-serif";
=09color:windowtext;}
span.BalloonTextChar
=09{mso-style-name:"Balloon Text Char";
=09mso-style-priority:99;
=09mso-style-link:"Balloon Text";
=09font-family:"Tahoma","sans-serif";}
.MsoChpDefault
=09{mso-style-type:export-only;
=09font-family:"Calibri","sans-serif";}
@page WordSection1
=09{size:8.5in 11.0in;
=09margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
=09{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=3D"EN-US" link=3D"blue" vlink=3D"purple">
<div class=3D"WordSection1">
<p class=3D"MsoNormal">Hi,<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">I&#8217;m using ManifoldCF 1.2 with ElasticSearch 0.=
90.&nbsp; I&#8217;m trying to index PDF files via the &#8220;Windows Shares=
&#8221; repository connector.&nbsp; I have the elasticsearch-mapper-attachm=
ents plugin installed in ElasticSearch.<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">When I run the job on an empty index, a &#8216;flat&=
#8217; schema is created:<o:p></o:p></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">{<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp; &quot;pdf_docs_flat_schema&quot; : {<o:p></o:p></sp=
an></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp; &quot;pdf_docs&quot; : {<o:p></o:p></sp=
an></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;properties&quot; : {<=
o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;_content_=
type&quot; : {<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &qu=
ot;type&quot; : &quot;string&quot;<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; },<o:p></o:p></=
span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&quot;_name&quo=
t; : {<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &qu=
ot;type&quot; : &quot;string&quot;<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; },<o:p></o:p></=
span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;allow_tok=
en_document&quot; : {<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &qu=
ot;type&quot; : &quot;string&quot;<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; },<o:p></o:p></=
span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;allow_tok=
en_share&quot; : {<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &qu=
ot;type&quot; : &quot;string&quot;<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; },<o:p></o:p></=
span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;deny_toke=
n_document&quot; : {<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &qu=
ot;type&quot; : &quot;string&quot;<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; },<o:p></o:p></=
span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;deny_toke=
n_share&quot; : {<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &qu=
ot;type&quot; : &quot;string&quot;<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; },<o:p></o:p></=
span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;file&quot=
; : {<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &qu=
ot;type&quot; : &quot;string&quot;<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; },<o:p></o:p></=
span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;lastModif=
ied&quot; : {<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &qu=
ot;type&quot; : &quot;string&quot;<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; },<o:p></o:p></=
span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;type&quot=
; : {<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&qu=
ot;type&quot; : &quot;string&quot;<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<o:p></o:p></s=
pan></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp;&nbsp;&nbsp; }<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">&nbsp; }<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;font-family:&quot;Co=
urier New&quot;">}<o:p></o:p></span></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">Notice that the <i>_content_type</i>, <i>_name</i>, =
<i>file</i>, and
<i>type</i> fields are all properties of type &#8220;string&#8221;.&nbsp; A=
s far as I can tell the &#8216;type&#8217; of &#8220;attachment&#8221; sent=
 with indexed file is just treated as a normal piece of metadata and the &#=
8216;file&#8217; field (which is snet as a base64 encoded string) is never =
processed
 as an attachment.<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">According to <a href=3D"http://www.elasticsearch.org=
/guide/reference/mapping/attachment-type/">
http://www.elasticsearch.org/guide/reference/mapping/attachment-type/</a> i=
t seems that the connector should use a
<i>mapping</i> command to set the &#8216;file&#8217; property with a type o=
f &#8216;attachment&#8217;, with &#8220;_content_type&#8221; and &#8220;_na=
me&#8221; fields as subfields of the &#8216;file&#8217; property.&nbsp; Als=
o, through testing I found that if you want the &#8216;date&#8217;, &#8216;=
title&#8217;, &#8216;author&#8217;, and &#8216;keywords&#8217; fields
 extracted from the document and saved, they need to be listed in the mappi=
ng too.&nbsp;&nbsp; (Unfortunately, using a mapping changes the JSON code f=
or adding the document to the index.&nbsp; Instead of sending the base64 en=
coded file attached to the &#8216;file&#8217; field, it&#8217;s attached
 to the &#8216;contents&#8217; subfield.)<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">Am I missing something obvious here?&nbsp; All I wan=
t is my documents properly indexed.<o:p></o:p></p>
<p class=3D"MsoNormal">Is this something for the &#8216;dev&#8217; mailing =
list instead?<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">Thanks,<o:p></o:p></p>
<p class=3D"MsoNormal">Rick<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<br>
<font face=3D"Arial" color=3D"Gray" size=3D"2">=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D<br>
The information contained in this message may be privileged<br>
and confidential and protected from disclosure. If the reader<br>
of this message is not the intended recipient, or an employee<br>
or agent responsible for delivering this message to the<br>
intended recipient, you are hereby notified that any reproduction,<br>
dissemination or distribution of this communication is strictly<br>
prohibited. If you have received this communication in error,<br>
please notify us immediately by replying to the message and<br>
deleting it from your computer. Thank you. Tellabs<br>
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>
</font>
</body>
</html>

--_000_6355997B50A79B48B60F55953D30E3B6019370A442B2EXWESTtella_--

Reply via email to