Hi Ameya, You cannot just comment out that line; instead you must supply an input stream. But you can create a null input stream, for example:
data.setBinary(new ByteArrayInputStream(new byte[0]),0); Karl On Thu, Jul 31, 2014 at 4:22 PM, Ameya Aware <[email protected]> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>> > long fileBytes = file.length(); > RepositoryDocument data = new RepositoryDocument(); > data.setBinary(is,fileBytes); > String fileName = file.getName(); > data.setFileName(fileName); > data.setMimeType(mapExtensionToMimeType(fileName)); > > <<<<<<<<<<<<<<<<<<<<<<<<<<< > > > do i just need to comment out 3rd line i.e. data.setBinary(is,fileBytes); > ?? > > > Thanks, > Ameya > > > On Thu, Jul 31, 2014 at 4:17 PM, Ameya Aware <[email protected]> > wrote: > >> I could not exactly locate the position where this is happening. >> >> Can you please help me out with the changes? >> >> Thanks, >> Ameya >> >> >> >> On Thu, Jul 31, 2014 at 4:10 PM, Karl Wright <[email protected]> wrote: >> >>> Hi Ameya, >>> >>> Since you are already modifying the connector for your purposes, nothing >>> is stopping you from modifying it further to not fetch the document and >>> instead substitute an empty input stream. >>> >>> Karl >>> >>> >>> >>> On Thu, Jul 31, 2014 at 3:03 PM, Ameya Aware <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> i have modified code a little to add different metadata fields such as >>>> below (FileConnector.java): >>>> >>>> data.addField("created", new >>>> Date((attr.creationTime().toMillis()))); >>>> data.addField("last_accessed", new >>>> Date(attr.lastAccessTime().toMillis())); >>>> data.addField("last_modified", new >>>> Date(file.lastModified())); >>>> data.addField("size", file.length()); >>>> >>>> >>>> which are being passed to Solr. >>>> >>>> Now can i stop MCF from reading a file and sending that content and >>>> just passed above information to Solr? >>>> >>>> >>>> Thanks, >>>> Ameya >>>> >>>> >>>> On Thu, Jul 31, 2014 at 2:57 PM, Karl Wright <[email protected]> >>>> wrote: >>>> >>>>> Hi Ameya, >>>>> >>>>> The file system connector does not retrieve any metadata for a >>>>> document at all. So I'm not sure what metadata you are talking about. >>>>> >>>>> Karl >>>>> >>>>> >>>>> >>>>> On Thu, Jul 31, 2014 at 2:44 PM, Ameya Aware <[email protected]> >>>>> wrote: >>>>> >>>>>> So the thing here is i am not looking for any data or content of any >>>>>> of files. I am just interested in metadata of file. >>>>>> >>>>>> So i thought it should be possible to not read any file and just get >>>>>> metadata of file and give to Solr. >>>>>> >>>>>> This should save lots of time. >>>>>> >>>>>> Is it possible to do this? >>>>>> >>>>>> Thanks, >>>>>> Ameya >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jul 31, 2014 at 2:13 PM, Karl Wright <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Ameya, >>>>>>> >>>>>>> (1) Please look at the Simple History report. Note what kinds of >>>>>>> documents are being fetched, what kinds are being indexed, and how long >>>>>>> it >>>>>>> is taking. I have noted from your previous posts that you seem to be >>>>>>> indexing a lot of very large EXE files. This is useless and you should >>>>>>> be >>>>>>> excluding them. >>>>>>> >>>>>>> (2) Please look in the manifoldcf.log file for evidence that fetches >>>>>>> and/or Solr indexing requests are being retried due to errors. It >>>>>>> doesn't >>>>>>> take many documents being chronically retried before forward progress >>>>>>> drops >>>>>>> to near zero. >>>>>>> >>>>>>> (3) If you look into (1) & (2) and everything seems fine, it may be >>>>>>> a misalignment between availability of several kinds of resources that >>>>>>> is >>>>>>> the problem. Please get a thread dump of the agents process while it is >>>>>>> crawling, using jstack. Post that thread dump and we can tell you what >>>>>>> to >>>>>>> look at next. >>>>>>> >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 31, 2014 at 2:07 PM, Ameya Aware <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> >>>>>>>> I am using filesystem connector to index my entire C drive using >>>>>>>> Solr as output connector. >>>>>>>> >>>>>>>> Initial 100000 documents were crawled and indexed successfully in >>>>>>>> couple of hours but after that indexing slowed down badly (around 15-20 >>>>>>>> documents per min). >>>>>>>> >>>>>>>> >>>>>>>> I am not able to figure out whether there is issue with MCF or Solr. >>>>>>>> >>>>>>>> >>>>>>>> Can you advice me how to proceed with this? >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Ameya >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
