[ 
https://issues.apache.org/jira/browse/SOLR-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748184#action_12748184
 ] 

frank farmer commented on SOLR-1091:
------------------------------------

My concern is not that solr do anything specific with this garbled data, only 
that wt=phps always returns a string that can be run through unserialize() 
without error.

Here's the exact case in which I encountered this bug, which may help explain 
why I reported this issue in the first place:

1) Somehow, a user inserted the aforementioned sequence of bytes in some 
user-editable content in my application.
2) My code blindly passed that data directly into solr (in retrospect, I should 
probably be filtering anything that's not valid UTF-8)
3) Users ran queries which included the affected document
4) My code tried to unserialize() the output, and failed with a PHP error 
(simply replacing the offending "s:4:" with "s:6:" caused the output to 
unserialize without issue, however).  This caused my users to be unable to 
retrieve results for many queries.

Long story short, if you let users insert arbitrary byte sequences into your 
index (which I'll admit is naive, but I'm sure I'm not the only one who's done 
this), and you use wt=phps, a malicious user can effectively cause a DoS.

Again, I don't care about actually getting these bytes back out of solr 
unmangled.  I only care that the output of wt=phps make it through 
unserialize() without causing a PHP error.

> "phps" (serialized PHP) writer produces invalid output
> ------------------------------------------------------
>
>                 Key: SOLR-1091
>                 URL: https://issues.apache.org/jira/browse/SOLR-1091
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.3
>         Environment: Sun JRE 1.6.0 on Centos 5
>            Reporter: frank farmer
>            Priority: Minor
>             Fix For: 1.4
>
>
> The serialized PHP output writer can outputs invalid string lengths for 
> certain (unusual) input values.  Specifically, I had a document containing 
> the following 6 byte character sequence: \xED\xAF\x80\xED\xB1\xB8
> I was able to create a document in the index containing this value without 
> issue; however, when fetching the document back out using the serialized PHP 
> writer, it returns a string like the following:
> s:4:"􀁸";
> Note that the string length specified is 4, while the string is actually 6 
> bytes long.
> When using PHP's native serialize() function, it correctly sets the length to 
> 6:
> # php -r 'var_dump(serialize("\xED\xAF\x80\xED\xB1\xB8"));'
> string(13) "s:6:"􀁸";"
> The "wt=php" writer, which produces output to be parsed with eval(), doesn't 
> have any trouble with this string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to