tagname and other improvments.

Fergus McMenemie (JIRA) Wed, 23 Sep 2009 03:47:48 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758652#action_12758652
 ]


Fergus McMenemie commented on SOLR-1437:
----------------------------------------

Noble,

Playing with the code... some observations I would like confirmed.

1) inside parse() the valuesAddedinThisFrame HashSet and the Stack<Set<String>> 
stack variables are only used to aid in the clean up after out-puting  record.

2) The code seems unable to collect text for a forEach xpath. So for the 
following fragment of code

{code}
    String xml="<root>\n"
             + "  <status>live</status>\n"
             + "  <contenido id=\"10097\" idioma=\"cat\">\n"
             + "    Cats can be cute\n"
             + "    <antetitulo></antetitulo>\n"
             + "    <titulo>\n           This is my title\n    </titulo>\n"
             + "    <resumen>\n          This is my summary\n   </resumen>\n"
             + "    <texto>\n     This is the body of my text\n   </texto>\n"
             + "    </contenido>\n"
             + "</root>";
    XPathRecordReader rr = new XPathRecordReader("/root/contenido");
    rr.addField("cat"   ,"/root/contenido", false); //  ***** FAILS *****
    rr.addField("id",    "/root/contenido/@id", false);
{code}

we can get the string associated with the id attrbute of <contenido> but not 
its child text! Is this a design goal, or just the way the code ended up 
behaving. Do we want it to continue to work this way?

> DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-1437
>                 URL: https://issues.apache.org/jira/browse/SOLR-1437
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Fergus McMenemie
>            Assignee: Noble Paul
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1437.patch, SOLR-1437.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> As per 
> http://www.nabble.com/Re%3A-Extract-info-from-parent-node-during-data-import-%28redirect%3A%29-td25471162.html
>  it would be nice to be able to use expressions such as //tagname when 
> parsing XML documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1437) DIH: Enhance XPathRecordReader to deal with //tagname and other improvments.

Reply via email to