I could not exactly locate the position where this is happening. Can you please help me out with the changes?
Thanks, Ameya On Thu, Jul 31, 2014 at 4:10 PM, Karl Wright <[email protected]> wrote: > Hi Ameya, > > Since you are already modifying the connector for your purposes, nothing > is stopping you from modifying it further to not fetch the document and > instead substitute an empty input stream. > > Karl > > > > On Thu, Jul 31, 2014 at 3:03 PM, Ameya Aware <[email protected]> > wrote: > >> Hi, >> >> i have modified code a little to add different metadata fields such as >> below (FileConnector.java): >> >> data.addField("created", new >> Date((attr.creationTime().toMillis()))); >> data.addField("last_accessed", new >> Date(attr.lastAccessTime().toMillis())); >> data.addField("last_modified", new >> Date(file.lastModified())); >> data.addField("size", file.length()); >> >> >> which are being passed to Solr. >> >> Now can i stop MCF from reading a file and sending that content and just >> passed above information to Solr? >> >> >> Thanks, >> Ameya >> >> >> On Thu, Jul 31, 2014 at 2:57 PM, Karl Wright <[email protected]> wrote: >> >>> Hi Ameya, >>> >>> The file system connector does not retrieve any metadata for a document >>> at all. So I'm not sure what metadata you are talking about. >>> >>> Karl >>> >>> >>> >>> On Thu, Jul 31, 2014 at 2:44 PM, Ameya Aware <[email protected]> >>> wrote: >>> >>>> So the thing here is i am not looking for any data or content of any of >>>> files. I am just interested in metadata of file. >>>> >>>> So i thought it should be possible to not read any file and just get >>>> metadata of file and give to Solr. >>>> >>>> This should save lots of time. >>>> >>>> Is it possible to do this? >>>> >>>> Thanks, >>>> Ameya >>>> >>>> >>>> >>>> On Thu, Jul 31, 2014 at 2:13 PM, Karl Wright <[email protected]> >>>> wrote: >>>> >>>>> Hi Ameya, >>>>> >>>>> (1) Please look at the Simple History report. Note what kinds of >>>>> documents are being fetched, what kinds are being indexed, and how long it >>>>> is taking. I have noted from your previous posts that you seem to be >>>>> indexing a lot of very large EXE files. This is useless and you should be >>>>> excluding them. >>>>> >>>>> (2) Please look in the manifoldcf.log file for evidence that fetches >>>>> and/or Solr indexing requests are being retried due to errors. It doesn't >>>>> take many documents being chronically retried before forward progress >>>>> drops >>>>> to near zero. >>>>> >>>>> (3) If you look into (1) & (2) and everything seems fine, it may be a >>>>> misalignment between availability of several kinds of resources that is >>>>> the >>>>> problem. Please get a thread dump of the agents process while it is >>>>> crawling, using jstack. Post that thread dump and we can tell you what to >>>>> look at next. >>>>> >>>>> Karl >>>>> >>>>> >>>>> >>>>> On Thu, Jul 31, 2014 at 2:07 PM, Ameya Aware <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> >>>>>> I am using filesystem connector to index my entire C drive using Solr >>>>>> as output connector. >>>>>> >>>>>> Initial 100000 documents were crawled and indexed successfully in >>>>>> couple of hours but after that indexing slowed down badly (around 15-20 >>>>>> documents per min). >>>>>> >>>>>> >>>>>> I am not able to figure out whether there is issue with MCF or Solr. >>>>>> >>>>>> >>>>>> Can you advice me how to proceed with this? >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Ameya >>>>>> >>>>> >>>>> >>>> >>> >> >
