Thanks I will try that On May 1, 2013 3:54 PM, "Karl Wright" <[email protected]> wrote:
> There is also a different way to do this entirely - there is a path > attribute you can send as metadata to Solr. Just include the entire path, > and put it into a different field that you declare in your schema. See > "path attribute" in the end-user documentation for the JCIFS connector. > > > > On Wed, May 1, 2013 at 8:52 AM, Karl Wright <[email protected]> wrote: > >> IE 6 is extremely old and I believe we developed for IE 7 at a minimum >> (there were two different versions with different functionality we had to >> support there), and made further changes for IE 8 when it came out. I have >> no idea what IE 9 or IE 10 do. >> >> The only way to change the encoding of the IRI is to modify the JCIFS >> connector code. But please bear in mind that unless you can show your >> modifications will work across a wide variety of browsers, we are unlikely >> to accept these changes back into the code base. >> >> The alternative is, since the encoding IS deterministic and reversible, >> you could readily write a Tika plugin that would modify at least the URL >> field in the manner you desire. But you could not modify the ID field >> since ManifoldCF uses this to delete documents that have disappeared. >> >> Karl >> >> >> >> On Wed, May 1, 2013 at 8:45 AM, Yossi Nachum <[email protected]> wrote: >> >>> The IRI is not working in my IE. I am using old version of IE V6 SP3. >>> But what I realy want is to display the correct name of the path with >>> hebrew characters. >>> If I understand you right, then I need to change the representation of >>> the IRI. How can I do that? >>> On May 1, 2013 3:14 PM, "Karl Wright" <[email protected]> wrote: >>> >>>> Right, that is exactly what I would expect. >>>> >>>> ManifoldCF uses a URL (which is constructed by the connector) as the >>>> primary key for every document as indexed in the search engine. The URL >>>> has two purposes: first, it is supposed to be unique, and second, it is >>>> supposed to allow someone who browses to that result to locate the >>>> document. In the case of JCIFS, the environment is presumed to be the >>>> local active directory domain(s), and the "URL" generated is really a file >>>> IRI, usually of the form "file://///server.domain/path/filename". You thus >>>> should be able to paste the "URL" of the document from Solr into a browser >>>> on a machine in the domain, and see the document load. >>>> >>>> As I said before, however, there are already certain problems with this >>>> because each version of IE differs somewhat in how it deals with non-ASCII >>>> characters. IRI legal character rules are somewhat different than URL >>>> rules, but IRI's are still nevertheless escaped in various ways. There are >>>> also multiple equivalent ways of representing the same file path with >>>> different IRI's. >>>> >>>> It is not typical that the ID and URL fields of a document are >>>> presented to the user in any meaningful way, so your question is usually >>>> academic in most settings. If you have a problem with the IRI's not >>>> actually working in a browser, that's of more immediate interest. Please >>>> let us know if that's the case. >>>> >>>> Thanks, >>>> Karl >>>> >>>> >>>> On Wed, May 1, 2013 at 8:04 AM, Yossi Nachum <[email protected]>wrote: >>>> >>>>> Thanks for your response >>>>> I am seeing these characters in solr when I search these files. >>>>> I am using the solr example site and these characters show up in the >>>>> ID field and URL field. >>>>> BTW I am running solr and mcf on a linux server >>>>> On May 1, 2013 1:11 PM, "Karl Wright" <[email protected]> wrote: >>>>> >>>>>> Where are you seeing these characters? Are you talking about the >>>>>> file IRI's that the JCIFS connector generates? Those IRI's are supposed >>>>>> to >>>>>> be constructed so that your browser would find them if you paste them >>>>>> into >>>>>> the browser URL window. Unfortunately, there is no good standard, and >>>>>> people follow IE's behavior, and IE has changed multiple times in how it >>>>>> deals with non-latin-1 characters. >>>>>> >>>>>> Please provide a bit more information so that we can provide a better >>>>>> answer. >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> >>>>>> On Wed, May 1, 2013 at 3:11 AM, Yossi Nachum <[email protected]>wrote: >>>>>> >>>>>>> Hello, >>>>>>> I install search server with solr and manifoldcf. >>>>>>> I want to index my netapp files over cifs and I have a problem with >>>>>>> hebrew files and directories. >>>>>>> When I search for these files in solr I see "%D7%91%D7%..." instead >>>>>>> of the directory path that contain hebrew characters . >>>>>>> I try to run the java process with "-Djcifs.encoding=cp1255" but it >>>>>>> didn't help. >>>>>>> Can anyone help and tell me how can I index directories/files in >>>>>>> hebrew? >>>>>>> >>>>>>> Thanks >>>>>>> Yossi >>>>>>> >>>>>> >>>>>> >>>> >> >
