On Sun, Dec 18, 2011 at 13:52, Antonio Valentino
<antonio.valent...@tiscali.it> wrote:
>> My patch creates keyword tags from reference labels instead of index
>> entries. It sets the id attribute to the reference label name and does
>> not include a name attribute (so the keyword does not show up in the
>> index). This enables the program to display locations in the document
>> without requiring that they have a corresponding index entry.
>>
>
> Ok.
> My concern regards the fact that almost all the keyword entries added in
> this way are in some manner duplicates of existing ones.
> Just they have a different id.

I just tried this out with the Sphinx documentation and I see your
point. It seems that there are some directives that create both a
reference label and an index entry. In this case, we would end up with
a duplicate entry. In some circumstances, we might even get
conflicting IDs and unpredictable results.

In the circumstances where the documentation has an explicit reference
label, this will only show up when using my patch. I think the
cleanest solution would be to only use reference labels to generate ID
attributes and only use index entries to generate name attributes.
However, this isn't really an option since some projects may already
depend on the IDs as they are currently generated.

Perhaps the best option, then, is to generate the new set of IDs but
ensure that they can not conflict with any previously-generated IDs.

I'm attaching a patch against the code at
bitbucket.org/avalentino/sphinx with the following changes (maybe I
should use hg export? I'm not familiar with hg):
1. The keyword_item method now returns a tuple (name, id, ref) rather
than a complete "<keyword ...>" tag
2. build_qhp() keeps a dictionary of all existing IDs so we can be
sure that there are not overwritten by any of the reference-label IDs
(this has the added benefit of removing most of the duplicate
reference entries you mentioned)
3. build_qhp() compiles all keyword items from index entries and
reference labels in one location.

There is also a block of commented code that displays the IDs that
would have conflicted with previous IDs. Interestingly, it seems there
are a lot of places in the sphinx code that generate non-unique
reference labels. I'm not sure if this was an oversight or if I'm not
quite grasping the full picture (the latter is more likely).


Luke

-- 
You received this message because you are subscribed to the Google Groups 
"sphinx-dev" group.
To post to this group, send email to sphinx-dev@googlegroups.com.
To unsubscribe from this group, send email to 
sphinx-dev+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sphinx-dev?hl=en.

diff -r 845a3d8d5ff0 sphinx/builders/qthelp.py
--- a/sphinx/builders/qthelp.py	Sun Dec 18 19:39:32 2011 +0100
+++ b/sphinx/builders/qthelp.py	Thu Dec 22 13:03:09 2011 -0500
@@ -141,24 +141,49 @@
                 new_sections.append(section)
         sections = u'\n'.encode('utf-8').join(new_sections)
 
-        # keywords
-        keywords = []
+        # keywords from index entries
+        keyword_items = []  # list of (name, id, ref) items to be added to keyword list
         index = self.env.create_index(self, group_entries=False)
         for (key, group) in index:
             for title, (refs, subitems) in group:
-                keywords.extend(self.build_keywords(title, refs, subitems))
+                keyword_items.extend(self.build_keywords(title, refs, subitems))
+                
+        index_ids = dict([item[1:] for item in keyword_items])
 
         # ID keywords from xref labels
-        item_template = ' ' * 12 + '<keyword id="%s" ref="%s"/>'
         for domain in self.env.domains.itervalues():
             for name, dispname, _, docname, anchor, _ in domain.get_objects():
                 if not anchor:
                     continue
-                uri = self.get_target_uri(docname) + '#' + anchor
-                item = item_template % (escape(name), escape(uri))
-                item.encode('ascii', 'xmlcharrefreplace')
-                keywords.append(item)
+                uri = escape(self.get_target_uri(docname) + '#' + anchor)
+                id_ = escape(name)
+                
+                if id_ not in index_ids:  ## Do not override any previously-existing IDs
+                    keyword_items.append((None, id_, uri))
+                    index_ids[id_] = uri
+                #else:
+                    #if index_ids[id_] == uri:
+                        #print "Skipping ID", id_, " (same reference)"
+                    #else:
+                        #print "Skipping ID", id_
+                        #print "        keep: ", index_ids[id_]
+                        #print "      ignore: ", uri
+                    
 
+        # generate keyword text
+        keywords = []
+        item_template = ' ' * 12 + '<keyword %s%sref="%s"/>'
+        for name, id_, ref in keyword_items:
+            if name is None:
+                name = ""
+            else:
+                name = 'name="%s" ' % name
+            if id_ is None:
+                id_ = ""
+            else:
+                id_ = 'id="%s" ' % id_
+            keywords.append(item_template % (name, id_, ref))
+        
         keywords = '\n'.join(keywords)
 
         # files
@@ -275,13 +300,7 @@
         else:
             id_ = None
 
-        if id:
-            item = ' ' * 12 + '<keyword name="%s" id="%s" ref="%s"/>' % (
-                name, id_, ref[1])
-        else:
-            item = ' ' * 12 + '<keyword name="%s" ref="%s"/>' % (name, ref[1])
-        item.encode('ascii', 'xmlcharrefreplace')
-        return item
+        return (name, id_, ref[1])
 
     def build_keywords(self, title, refs, subitems):
         keywords = []

Reply via email to