Right, so the new org.simple.json JSON parser uses hash order for keys. That scrambles their order on reading. So unless you intermingle includes and excludes within a start point, you are currently at risk of getting the order switched on you.
There's a clean-room implementation of the old JSON parser available now; I'll have to look into going back to it. But for now I'm going to change how output is done so that it only uses arrays if there's a single child type possible. Karl On Tue, Mar 13, 2018 at 4:19 PM, Karl Wright <[email protected]> wrote: > The code has two ways of representing the same thing in JSON. One way > collapses similar child types into arrays. The other way (which is used > when it's determined that the first way won't maintain order) is quite > different. Please see the following code: > > >>>>>> > /** Get as JSON. > *@return the json corresponding to this Configuration. > */ > public String toJSON() > throws ManifoldCFException > { > JSONWriter writer = new JSONWriter(); > writer.startObject(); > // We do NOT use the root node label, unlike XML. > > // Now, do children. To get the arrays right, we need to glue > together all children with the > // same type, which requires us to do an appropriate pass to gather > that stuff together. > // Since we also need to maintain order, it is essential that we > detect the out-of-order condition > // properly, and use an alternate representation if we should find it. > Map<String,List<ConfigurationNode>> childMap = new > HashMap<String,List<ConfigurationNode>>(); > List<String> childList = new ArrayList<String>(); > String lastChildType = null; > boolean needAlternate = false; > int i = 0; > while (i < getChildCount()) > { > ConfigurationNode child = findChild(i++); > String key = child.getType(); > List<ConfigurationNode> list = childMap.get(key); > if (list == null) > { > list = new ArrayList<ConfigurationNode>(); > childMap.put(key,list); > childList.add(key); > } > else > { > if (!lastChildType.equals(key)) > { > needAlternate = true; > break; > } > } > list.add(child); > lastChildType = key; > } > > if (needAlternate) > { > // Can't use the array representation. We'll need to start do a > _children_ object, and enumerate > // each child. So, the JSON will look like: > // <key>:{_attribute_<attr>:xxx,_children_:[{_type_:<child_key>, > ...},{_type_:<child_key_2>, ...}, ...]} > writer.key(JSON_CHILDREN); > writer.startArray(); > i = 0; > while (i < getChildCount()) > { > ConfigurationNode child = findChild(i++); > writeNode(writer,child,false,true); > } > writer.endArray(); > } > else > { > // We can collapse child nodes to arrays and still maintain order. > // The JSON will look like this: > // > <key>:{_attribute_<attr>:xxx,<child_key>:[stuff],<child_key_2>:[more_stuff] > ...} > int q = 0; > while (q < childList.size()) > { > String key = childList.get(q++); > List<ConfigurationNode> list = childMap.get(key); > if (list.size() > 1) > { > // Write it as an array > writer.key(key); > writer.startArray(); > i = 0; > while (i < list.size()) > { > ConfigurationNode child = list.get(i++); > writeNode(writer,child,false,false); > } > writer.endArray(); > } > else > { > // Write it as a singleton > writeNode(writer,list.get(0),true,false); > } > } > } > writer.endObject(); > > // Convert to a string. > return writer.toString(); > } > <<<<<< > > *IF* the specification from your UI-ordered rules cannot be output as the > array-style JSON, *THEN* the alternate representation will be used. That > is why I suggested that you hand-order your example job and then output the > JSON, because you will see the format that will definitely preserve the > order. I strongly suggest using that format to guarantee the order. > > There is a possibility that we have a bug where the ordering within types > is preserved, but the ordering between types is not properly preserved. > This is what I suspect is happening. If true, it is because we migrated to > a different JSON implementation as a result of legal issues a year or two > back. That's what I'm going to look at next. But in any case you should > be able to use the order-guaranteed JSON format to get past your problems. > > Thanks, > Karl > > > On Tue, Mar 13, 2018 at 4:02 PM, Karl Wright <[email protected]> wrote: > >> The issue is due to the mapping from XML to JSON. Order is preserved, >> but only within each level. So the includes are all in order but all >> includes go before all excludes, etc. I'll have to consider how best to >> resolve that. >> >> Karl >> >> On Tue, Mar 13, 2018 at 3:50 PM, Karl Wright <[email protected]> wrote: >> >>> Hi Maxence, >>> >>> If you EXPORT a job that works in JSON, and then IMPORT the exported >>> JSON into a new job, is that job broken? >>> >>> Karl >>> >>> >>> On Tue, Mar 13, 2018 at 1:50 PM, msaunier <[email protected]> wrote: >>> >>>> Hello Karl, >>>> >>>> >>>> >>>> I have created 3 situations : >>>> >>>> >>>> >>>> 1. Create job manually (1_job_manually.json | 1_job_manually.png) >>>> >>>> 2. Create job with script and modify the order manually >>>> (2_job_mixte.json | 2_job_mixte.png) >>>> >>>> 3. Create job with script (3_job_script.json | 3_job_script.png) >>>> >>>> >>>> >>>> I do not see the difference. >>>> >>>> >>>> >>>> So : 1 and 2 work good, with the good order, but 3 have included files >>>> and directories in first. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Maxence >>>> >>>> >>>> >>>> *De :* Karl Wright [mailto:[email protected]] >>>> *Envoyé :* lundi 12 mars 2018 21:29 >>>> *À :* [email protected] >>>> *Cc :* Fabien Harrang <[email protected]>; REUILLON Dominique < >>>> [email protected]> >>>> >>>> *Objet :* Re: Modify job to add excludes files and directory >>>> >>>> >>>> >>>> Here is an idea. Define your job in the ui and use the API to fetch >>>> the json for it. >>>> >>>> >>>> >>>> Karl >>>> >>>> >>>> >>>> On Mon, Mar 12, 2018, 12:51 PM Karl Wright <[email protected]> wrote: >>>> >>>> I will need to look at this later tonight before I can respond in >>>> detail. >>>> >>>> The document specification part of the API uses EXACTLY the same data >>>> as is stored for the job. There only difference is that the job >>>> specification is stored in XML, not JSON. The converters between the two >>>> do preserve ordering, however. >>>> >>>> >>>> >>>> Karl >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Mar 12, 2018 at 12:38 PM, msaunier <[email protected]> wrote: >>>> >>>> *1 :* >>>> >>>> I have find a problem on the *file system connector* parts in this >>>> page (I think) : https://manifoldcf.apache.org/ >>>> release/release-2.9.1/en_US/programmatic-operation.html >>>> >>>> >>>> >>>> You have read this JSON : >>>> >>>> >>>> >>>> {"startpoint":[{"_attribute_path":"c:\path_to_files","includ >>>> e":[{"_attribute_type":"file","_attribute_match":"*.txt"},{" >>>> _attribute_type":"file","_attribute_match":"*.doc"\,"_attrib >>>> ute_type":"directory","_attribute_match":"*"],"exclude":["*.mov"]]} >>>> >>>> >>>> >>>> I think, the json syntax is bad. I fink the correct JSON is : >>>> >>>> >>>> >>>> {"startpoint":[{"_attribute_path":"c:\\path_to_files","inclu >>>> de":[{"_attribute_type":"file","_attribute_match":"*.txt"},{ >>>> "_attribute_type":"file","_attribute_match":"*.doc","_attrib >>>> ute_type":"directory","_attribute_match":"*"}],"exclude":["*.mov"]}]} >>>> >>>> >>>> >>>> Corrections list : >>>> >>>> {"startpoint":[{"_attribute_path":"c:\*\*path_to_files","inclu >>>> de":[{"_attribute_type":"file","_attribute_match":"*.txt"},{ >>>> "_attribute_type":"file","_attribute_match":"*.doc"*\*,"_attri >>>> bute_type":"directory","_attribute_match":"*"*}*],"exclude":["*.mov"] >>>> *}*]} >>>> >>>> >>>> >>>> But, this configuration does not working with the *Windows Share* >>>> connector. Syntax error on the exclude. >>>> >>>> >>>> >>>> *2 :* >>>> >>>> For my problem, the JSON format is not the problem. It work. I join the >>>> json, generated with my python script and my database. >>>> *(srvics33.json)* >>>> >>>> >>>> >>>> If I go on the interface after PUT the configuration, they included >>>> files are in first and excluded in second. *(image1.png) *In my JSON, >>>> I have add excludes in first, but they are in second. >>>> >>>> I am forced to go on the interface and manually modify the order to >>>> optain a good result. *(image2.png)* >>>> >>>> >>>> >>>> Can I enter an order parameter [1-*] to place excluded files and >>>> directories in first? >>>> >>>> >>>> >>>> Thanks. >>>> >>>> >>>> >>>> Maxence >>>> >>>> >>>> >>>> *De :* Karl Wright [mailto:[email protected]] >>>> *Envoyé :* lundi 12 mars 2018 14:38 >>>> >>>> >>>> *À :* [email protected] >>>> *Cc :* Fabien Harrang <[email protected]>; REUILLON Dominique < >>>> [email protected]> >>>> *Objet :* Re: Modify job to add excludes files and directory >>>> >>>> >>>> >>>> Hi Maxence, >>>> >>>> >>>> >>>> You can have as many clauses in your JSON rule list as you like. You >>>> do not need to have both include and exclude rules in each clause. So you >>>> can precisely do in the JSON what you do in the UI. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Karl >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Mar 12, 2018 at 9:07 AM, msaunier <[email protected]> wrote: >>>> >>>> Ok. I have read that on the documentation : >>>> >>>> >>>> >>>> Rules are evaluated from top to bottom, and the first rule that >>>> matches the file name is the one that is chosen. >>>> >>>> >>>> >>>> But, in the API, if I PUT a new Job definition with the good order, >>>> ManifoldCF add included documents in first all the time. If I need to >>>> exlude in first, I can’t with API definition. I add the JSON at this email. >>>> >>>> >>>> >>>> API have an order parameter for the Startpoint, included and excluded >>>> files/directories ? >>>> >>>> >>>> >>>> (PS : I prefer exclude in first and include * to have a total control >>>> on the GED, to keep an eye on they documents) >>>> >>>> (PS2 : I generate this JSON and send it with a python script and it >>>> working good) >>>> >>>> >>>> >>>> Thanks >>>> >>>> >>>> >>>> *De :* Karl Wright [mailto:[email protected]] >>>> *Envoyé :* vendredi 9 mars 2018 12:53 >>>> *À :* [email protected] >>>> *Cc :* Fabien Harrang <[email protected]>; REUILLON Dominique < >>>> [email protected]> >>>> *Objet :* Re: Modify job to add excludes files and directory >>>> >>>> >>>> >>>> Hi Maxence, >>>> >>>> >>>> >>>> In the middle of job run, if you change the specification of what >>>> documents are included and excluded, the implementation of the connector >>>> determines how it will behave. There is no guarantee that documents that >>>> are excluded will be removed, for example if the connector filters >>>> documents only when they are queued. You may need to run the job a second >>>> time to be sure everything is removed. >>>> >>>> So the official answer is that "it depends". >>>> >>>> >>>> >>>> Karl >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Mar 9, 2018 at 5:38 AM, msaunier <[email protected]> wrote: >>>> >>>> Hello Karl, >>>> >>>> >>>> >>>> If I add on a job (in live) new files and directories to exclude, >>>> ManifoldCF delete old indexed files that meet these exclusions? Or I need >>>> to reseed all of my documents? >>>> >>>> >>>> >>>> Thanks you. >>>> >>>> >>>> >>>> Maxence SAUNIER >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >
