Hi Ethan, The activity logging for most connectors in 1.7.1 is not complete. See CONNECTORS-1077 for details. But in that case your should not see a record labeled with your connection name of the type "document ingest", and you *should* see errors in the ManifoldCF log.
On Mon, Oct 27, 2014 at 3:17 PM, Ethan Wilansky <[email protected]> wrote: > Hi Karl, > In simple history there are no indexing activity records showing 0. All of > the content on this Google Drive endpoint are either small uploaded files > (docx, pptx, pdf, txt) or Google Docs generated documents, spreadsheets and > presentations. > > With regard to opening a ticket, it might not be worth your while. > Ultimately, our use case is that we will be leveraging an ES Output > Connection for retrieving metadata and we will store the binaries on the > file system. We don’t want to use the ES Attachment plug-in, which is why I > thought we might be able to combine the ES Output Connection and a File > System Connection in a job. I suppose another option would be to involve > Tika, but I’m not clear on whether this will allow me to store the metadata > in ES with a pointer to the binary in the file system. > > Thanks, > Ethan > > > > > > > > > > On Oct 27, 2014, at 2:27 PM, Karl Wright <[email protected]> wrote: > > Hi Ethan, > > This does not sound like it is related in any way to the google drive > connection, unless for some reason the google API is considering some of > the documents fetched to have only metadata and no content. In this case, > you'd see size of zero in the simple history for indexing activity record. > Is that what you see? > > As for the filename issues -- file system output connection is supposed to > emulate WGET. However, there are a number of known issues with this > connector, for example CONNECTORS-814, and I believe the handling of "&" is > one such issue. I don't think these characters are allowed file names on > several operating systems. > > Please open a ticket, and describe how you think it should behave (e.g. > how it should map &'s in urls to legal file name characters), and I'll try > to come up with a quick patch. > > Karl > > > On Mon, Oct 27, 2014 at 12:15 PM, Ethan Wilansky <[email protected]> > wrote: > >> I’ve run a job that uses a Google Drive Repository Connection and File >> System Output Connection. My output is pointing to d:\temp\mf on the >> machine running ManifoldCF. >> >> Upon running the job, job status shows: >> Error: Could not create file 'd:\temp\mf\https\ >> doc-0g-1c-docs.googleusercontent.com\docs\securesc\288dijb8 >> lhptipmnpc6n3dap4bdki35j\ek70aeovi25lp7aibkar61h90pi1i2c3\1414418400000\14058876669334088852\07105634325979498590\0B4rsPDZwaBMUZjI3VGpzZi10dUU?h=00194472260389282923&e=download&gd=true' >> *(The filename, directory name, or volume label syntax is incorrect)* >> >> This same report that the file name, label or syntax is incorrect is >> being reported by the file system one more time. So, out of 12 files total, >> 10 are processed. However, for the files that are reported as successfully >> processed, none of the files appear in the file system. >> >> I think the file system path is unusual beyond what I’ve specified for >> the job (d:\temp\mf). I’m seeing something like the following as the path >> structure: >> D:\temp\mf\https\doc-0g-1c-docs.googleusercontent.com >> \docs\securesc\288dijb8lhptipmnpc6n3dap4bdki35j\ek3m4mhv978b7a2elgov6cm9nipbv36e\1414418400000\13058876669334088852\07105634445979498592 >> >> Document Status and Queue Status show nothing unusual. I’m running on >> ManifoldCF release (v1.7.1) >> >> Could this be an issue with the way I’m configuring the File System >> Output Connection or is there something else I need to configure? I >> properly configured the refresh token, client id and client secret in the >> Repository Connection. >> >> I’ve attached the JSON for the Repository Connection (with client id, >> client secret and refresh token values removed), my Output Connection and >> Job Definition. >> >> Thanks in advance for your feedback >> Ethan >> >> >> >> >> , >> >> > >
