Solr541 Carriage Return Stripped Off In String Field ?

Kosila Yuichiro Tue, 02 Feb 2016 22:26:14 -0800

Hello.
I have a question regarding to "string" type field.

[ Symptom ]
When a string value including carriage return line feed (\r\n)
and passed that over to a string field, it is stored, however,
when I query that document and see the value of the field,
carriage return is stripped off away.


[ Question ]
Is this a supposed behavior ?

[ Environment ]
Apache Solr 5.4.1
Document added via its SolrJ

[ How To Reproduce ]

(1)  Download Apache Solr 5.4.1
(2)  Create a core , "test"

(3)  Prepare two fields,  "id" and "field20"
     Assign the following attributes to those fields ;
       -  type="string"  indexed="true"  stored="true"  required="true" 
multiValued="false"

(4)  Start up the Solr and from AdminGUI,
make sure that everything is working and no error coming up,
and confirm that the defined two fields are available.

(5)  Make a tiny test program using SolrJ,
     to test a document insert, and to query against it.
     Jar files used ;
        - apache-solr-solrj-5.4.0.jar
        - apache-solr-core-5.4.0.jar
        - commons-codec-1.9.jar
        - httpclient-4.5.1.jar
        - commons-io-2.4.jar
        - slf4j-api-1.7.13.jar
        - jcl-over-slf4j-1.7.14.jar
        - slf4j-jdk14-1.7.14.jar

(6)  Insert a document where the value of field20 given as "ABC\r\nDEF"
(7)  When I query that document, from both AdminGUI and SolrJ,
     I see the value retrieved as "ABD\nDEF" , where "\r" is stripped off.


[ Test Code ]

package solrtest ;
public class SolrTest {

  public static void main(String[] args) throws IOException,SolrServerException 
{

    String url = "http://localhost:8983/solr/test"; ;
    HttpSolrServer server = new HttpSolrServer(url) ;
    server.setParser(new XMLResponseParser()) ;

    String mydata = "ABC\r\nDEF" ;
    byte[] asciiCodes = mydata.getBytes("US-ASCII") ;
    System.out.println (asciiCodes[3] + " , " + asciiCodes[4]) ;

    SolrInputDocument mydoc = new SolrInputDocument() ;
    mydoc.addField ( "id"      , "98765" , 1.0f ) ;
    mydoc.addField ( "field20" , mydata  , 1.0f ) ;

    Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>() ;
    docs.add ( mydoc ) ;
    server.add ( docs ) ;
    server.commit () ;

    SolrQuery myquery = new SolrQuery() ;
    myquery.setQuery (" id:98765" ) ;
    QueryResponse rsp = server.query(myquery) ;
    SolrDocumentList hits = rsp.getResults() ;

    String target = "" ;
    int pos = 0 ;
    while ( pos < hits.getNumFound() ) {

      ListIterator<SolrDocument> docloop = hits.listIterator() ;

      while ( docloop.hasNext() ) {
        pos++ ;

        SolrDocument hitdoc = docloop.next() ;
        Map<String, Collection<Object>> fieldvalues = 
hitdoc.getFieldValuesMap() ;
        Iterator<String> fieldnames = hitdoc.getFieldNames().iterator() ;

        while ( fieldnames.hasNext() ) {

          String fieldname = fieldnames.next() ;

          Collection<Object> cellvalues = fieldvalues.get(fieldname) ;
          Iterator<Object> valueloop = cellvalues.iterator() ;

          while ( valueloop.hasNext() ) {
            Object cellobj = valueloop.next() ;
            String cellvalue = cellobj.toString() ;

            if ( fieldname.equals("field20") ) {
              target = cellvalue ;
            }

          }
        }
      }
    }

    asciiCodes = target.getBytes("US-ASCII") ;
    for ( int i=0 ; i < target.length() ; i++ ) {
      System.out.print ( asciiCodes[i] + " " ) ;
    }
    System.out.println ("\r\n") ;

    server.close() ;

  }
}

--

Thank you in advance.
Yuichiro Kosila , Tokyo/Japan

Solr541 Carriage Return Stripped Off In String Field ?

Reply via email to