I created a ticket (CONNECTORS-1499) and attached a patch that uses the more detailed format in all situations where hash order could affect things. If you apply the patch, you should definitely see a difference in the JSON output when you dump a job in JSON format. You will still need to learn to use the order-preserving format when generating your own JSON.
Thanks, Karl On Tue, Mar 13, 2018 at 4:33 PM, Karl Wright <daddy...@gmail.com> wrote: > Right, so the new org.simple.json JSON parser uses hash order for keys. > That scrambles their order on reading. So unless you intermingle includes > and excludes within a start point, you are currently at risk of getting the > order switched on you. > > There's a clean-room implementation of the old JSON parser available now; > I'll have to look into going back to it. But for now I'm going to change > how output is done so that it only uses arrays if there's a single child > type possible. > > Karl > > > On Tue, Mar 13, 2018 at 4:19 PM, Karl Wright <daddy...@gmail.com> wrote: > >> The code has two ways of representing the same thing in JSON. One way >> collapses similar child types into arrays. The other way (which is used >> when it's determined that the first way won't maintain order) is quite >> different. Please see the following code: >> >> >>>>>> >> /** Get as JSON. >> *@return the json corresponding to this Configuration. >> */ >> public String toJSON() >> throws ManifoldCFException >> { >> JSONWriter writer = new JSONWriter(); >> writer.startObject(); >> // We do NOT use the root node label, unlike XML. >> >> // Now, do children. To get the arrays right, we need to glue >> together all children with the >> // same type, which requires us to do an appropriate pass to gather >> that stuff together. >> // Since we also need to maintain order, it is essential that we >> detect the out-of-order condition >> // properly, and use an alternate representation if we should find it. >> Map<String,List<ConfigurationNode>> childMap = new >> HashMap<String,List<ConfigurationNode>>(); >> List<String> childList = new ArrayList<String>(); >> String lastChildType = null; >> boolean needAlternate = false; >> int i = 0; >> while (i < getChildCount()) >> { >> ConfigurationNode child = findChild(i++); >> String key = child.getType(); >> List<ConfigurationNode> list = childMap.get(key); >> if (list == null) >> { >> list = new ArrayList<ConfigurationNode>(); >> childMap.put(key,list); >> childList.add(key); >> } >> else >> { >> if (!lastChildType.equals(key)) >> { >> needAlternate = true; >> break; >> } >> } >> list.add(child); >> lastChildType = key; >> } >> >> if (needAlternate) >> { >> // Can't use the array representation. We'll need to start do a >> _children_ object, and enumerate >> // each child. So, the JSON will look like: >> // <key>:{_attribute_<attr>:xxx,_children_:[{_type_:<child_key>, >> ...},{_type_:<child_key_2>, ...}, ...]} >> writer.key(JSON_CHILDREN); >> writer.startArray(); >> i = 0; >> while (i < getChildCount()) >> { >> ConfigurationNode child = findChild(i++); >> writeNode(writer,child,false,true); >> } >> writer.endArray(); >> } >> else >> { >> // We can collapse child nodes to arrays and still maintain order. >> // The JSON will look like this: >> // >> <key>:{_attribute_<attr>:xxx,<child_key>:[stuff],<child_key_2>:[more_stuff] >> ...} >> int q = 0; >> while (q < childList.size()) >> { >> String key = childList.get(q++); >> List<ConfigurationNode> list = childMap.get(key); >> if (list.size() > 1) >> { >> // Write it as an array >> writer.key(key); >> writer.startArray(); >> i = 0; >> while (i < list.size()) >> { >> ConfigurationNode child = list.get(i++); >> writeNode(writer,child,false,false); >> } >> writer.endArray(); >> } >> else >> { >> // Write it as a singleton >> writeNode(writer,list.get(0),true,false); >> } >> } >> } >> writer.endObject(); >> >> // Convert to a string. >> return writer.toString(); >> } >> <<<<<< >> >> *IF* the specification from your UI-ordered rules cannot be output as the >> array-style JSON, *THEN* the alternate representation will be used. That >> is why I suggested that you hand-order your example job and then output the >> JSON, because you will see the format that will definitely preserve the >> order. I strongly suggest using that format to guarantee the order. >> >> There is a possibility that we have a bug where the ordering within types >> is preserved, but the ordering between types is not properly preserved. >> This is what I suspect is happening. If true, it is because we migrated to >> a different JSON implementation as a result of legal issues a year or two >> back. That's what I'm going to look at next. But in any case you should >> be able to use the order-guaranteed JSON format to get past your problems. >> >> Thanks, >> Karl >> >> >> On Tue, Mar 13, 2018 at 4:02 PM, Karl Wright <daddy...@gmail.com> wrote: >> >>> The issue is due to the mapping from XML to JSON. Order is preserved, >>> but only within each level. So the includes are all in order but all >>> includes go before all excludes, etc. I'll have to consider how best to >>> resolve that. >>> >>> Karl >>> >>> On Tue, Mar 13, 2018 at 3:50 PM, Karl Wright <daddy...@gmail.com> wrote: >>> >>>> Hi Maxence, >>>> >>>> If you EXPORT a job that works in JSON, and then IMPORT the exported >>>> JSON into a new job, is that job broken? >>>> >>>> Karl >>>> >>>> >>>> On Tue, Mar 13, 2018 at 1:50 PM, msaunier <msaun...@citya.com> wrote: >>>> >>>>> Hello Karl, >>>>> >>>>> >>>>> >>>>> I have created 3 situations : >>>>> >>>>> >>>>> >>>>> 1. Create job manually (1_job_manually.json | 1_job_manually.png) >>>>> >>>>> 2. Create job with script and modify the order manually >>>>> (2_job_mixte.json | 2_job_mixte.png) >>>>> >>>>> 3. Create job with script (3_job_script.json | 3_job_script.png) >>>>> >>>>> >>>>> >>>>> I do not see the difference. >>>>> >>>>> >>>>> >>>>> So : 1 and 2 work good, with the good order, but 3 have included files >>>>> and directories in first. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Maxence >>>>> >>>>> >>>>> >>>>> *De :* Karl Wright [mailto:daddy...@gmail.com] >>>>> *Envoyé :* lundi 12 mars 2018 21:29 >>>>> *À :* user@manifoldcf.apache.org >>>>> *Cc :* Fabien Harrang <fharr...@citya.com>; REUILLON Dominique < >>>>> dreuil...@citya.com> >>>>> >>>>> *Objet :* Re: Modify job to add excludes files and directory >>>>> >>>>> >>>>> >>>>> Here is an idea. Define your job in the ui and use the API to fetch >>>>> the json for it. >>>>> >>>>> >>>>> >>>>> Karl >>>>> >>>>> >>>>> >>>>> On Mon, Mar 12, 2018, 12:51 PM Karl Wright <daddy...@gmail.com> wrote: >>>>> >>>>> I will need to look at this later tonight before I can respond in >>>>> detail. >>>>> >>>>> The document specification part of the API uses EXACTLY the same data >>>>> as is stored for the job. There only difference is that the job >>>>> specification is stored in XML, not JSON. The converters between the two >>>>> do preserve ordering, however. >>>>> >>>>> >>>>> >>>>> Karl >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Mar 12, 2018 at 12:38 PM, msaunier <msaun...@citya.com> wrote: >>>>> >>>>> *1 :* >>>>> >>>>> I have find a problem on the *file system connector* parts in this >>>>> page (I think) : https://manifoldcf.apache.org/ >>>>> release/release-2.9.1/en_US/programmatic-operation.html >>>>> >>>>> >>>>> >>>>> You have read this JSON : >>>>> >>>>> >>>>> >>>>> {"startpoint":[{"_attribute_path":"c:\path_to_files","includ >>>>> e":[{"_attribute_type":"file","_attribute_match":"*.txt"},{" >>>>> _attribute_type":"file","_attribute_match":"*.doc"\,"_attrib >>>>> ute_type":"directory","_attribute_match":"*"],"exclude":["*.mov"]]} >>>>> >>>>> >>>>> >>>>> I think, the json syntax is bad. I fink the correct JSON is : >>>>> >>>>> >>>>> >>>>> {"startpoint":[{"_attribute_path":"c:\\path_to_files","inclu >>>>> de":[{"_attribute_type":"file","_attribute_match":"*.txt"},{ >>>>> "_attribute_type":"file","_attribute_match":"*.doc","_attrib >>>>> ute_type":"directory","_attribute_match":"*"}],"exclude":["*.mov"]}]} >>>>> >>>>> >>>>> >>>>> Corrections list : >>>>> >>>>> {"startpoint":[{"_attribute_path":"c:\*\*path_to_files","inclu >>>>> de":[{"_attribute_type":"file","_attribute_match":"*.txt"},{ >>>>> "_attribute_type":"file","_attribute_match":"*.doc"*\*,"_attri >>>>> bute_type":"directory","_attribute_match":"*"*}*],"exclude":["*.mov"] >>>>> *}*]} >>>>> >>>>> >>>>> >>>>> But, this configuration does not working with the *Windows Share* >>>>> connector. Syntax error on the exclude. >>>>> >>>>> >>>>> >>>>> *2 :* >>>>> >>>>> For my problem, the JSON format is not the problem. It work. I join >>>>> the json, generated with my python script and my database. >>>>> *(srvics33.json)* >>>>> >>>>> >>>>> >>>>> If I go on the interface after PUT the configuration, they included >>>>> files are in first and excluded in second. *(image1.png) *In my JSON, >>>>> I have add excludes in first, but they are in second. >>>>> >>>>> I am forced to go on the interface and manually modify the order to >>>>> optain a good result. *(image2.png)* >>>>> >>>>> >>>>> >>>>> Can I enter an order parameter [1-*] to place excluded files and >>>>> directories in first? >>>>> >>>>> >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>>>> Maxence >>>>> >>>>> >>>>> >>>>> *De :* Karl Wright [mailto:daddy...@gmail.com] >>>>> *Envoyé :* lundi 12 mars 2018 14:38 >>>>> >>>>> >>>>> *À :* user@manifoldcf.apache.org >>>>> *Cc :* Fabien Harrang <fharr...@citya.com>; REUILLON Dominique < >>>>> dreuil...@citya.com> >>>>> *Objet :* Re: Modify job to add excludes files and directory >>>>> >>>>> >>>>> >>>>> Hi Maxence, >>>>> >>>>> >>>>> >>>>> You can have as many clauses in your JSON rule list as you like. You >>>>> do not need to have both include and exclude rules in each clause. So you >>>>> can precisely do in the JSON what you do in the UI. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Karl >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Mar 12, 2018 at 9:07 AM, msaunier <msaun...@citya.com> wrote: >>>>> >>>>> Ok. I have read that on the documentation : >>>>> >>>>> >>>>> >>>>> Rules are evaluated from top to bottom, and the first rule that >>>>> matches the file name is the one that is chosen. >>>>> >>>>> >>>>> >>>>> But, in the API, if I PUT a new Job definition with the good order, >>>>> ManifoldCF add included documents in first all the time. If I need to >>>>> exlude in first, I can’t with API definition. I add the JSON at this >>>>> email. >>>>> >>>>> >>>>> >>>>> API have an order parameter for the Startpoint, included and excluded >>>>> files/directories ? >>>>> >>>>> >>>>> >>>>> (PS : I prefer exclude in first and include * to have a total control >>>>> on the GED, to keep an eye on they documents) >>>>> >>>>> (PS2 : I generate this JSON and send it with a python script and it >>>>> working good) >>>>> >>>>> >>>>> >>>>> Thanks >>>>> >>>>> >>>>> >>>>> *De :* Karl Wright [mailto:daddy...@gmail.com] >>>>> *Envoyé :* vendredi 9 mars 2018 12:53 >>>>> *À :* user@manifoldcf.apache.org >>>>> *Cc :* Fabien Harrang <fharr...@citya.com>; REUILLON Dominique < >>>>> dreuil...@citya.com> >>>>> *Objet :* Re: Modify job to add excludes files and directory >>>>> >>>>> >>>>> >>>>> Hi Maxence, >>>>> >>>>> >>>>> >>>>> In the middle of job run, if you change the specification of what >>>>> documents are included and excluded, the implementation of the connector >>>>> determines how it will behave. There is no guarantee that documents that >>>>> are excluded will be removed, for example if the connector filters >>>>> documents only when they are queued. You may need to run the job a second >>>>> time to be sure everything is removed. >>>>> >>>>> So the official answer is that "it depends". >>>>> >>>>> >>>>> >>>>> Karl >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Mar 9, 2018 at 5:38 AM, msaunier <msaun...@citya.com> wrote: >>>>> >>>>> Hello Karl, >>>>> >>>>> >>>>> >>>>> If I add on a job (in live) new files and directories to exclude, >>>>> ManifoldCF delete old indexed files that meet these exclusions? Or I need >>>>> to reseed all of my documents? >>>>> >>>>> >>>>> >>>>> Thanks you. >>>>> >>>>> >>>>> >>>>> Maxence SAUNIER >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >> >