[jira] [Commented] (COUCHDB-3173) Views return corrupt data for text fields containing non-BMP characters

2016-10-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15545556#comment-15545556
 ] 

ASF GitHub Bot commented on COUCHDB-3173:
-

Github user asfgit closed the pull request at:

https://github.com/apache/couchdb-couch/pull/202


> Views return corrupt data for text fields containing non-BMP characters
> ---
>
> Key: COUCHDB-3173
> URL: https://issues.apache.org/jira/browse/COUCHDB-3173
> Project: CouchDB
>  Issue Type: Bug
>  Components: JavaScript View Server
>Affects Versions: 2.0.0
>Reporter: Loke
>
> When inserting a non-BMP character (i.e. characters with a Unicode codepoint 
> above {{U+}}), the content gets corrupted after reading it from a view. 
> At every instance of such characters, there is an exta {{U+FFFD REPLACEMENT 
> CHARACTER}} inserted into the text.
> To reproduce, use the following commands.
> Create the document containing a field with the character {{U+1F604 SMILING 
> FACE WITH OPEN MOUTH AND SMILING EYES}}:
> {noformat}
> $ curl -X PUT -d '{"type":"foo","value":""}' http://localhost:5984/foo/foo2
> {"ok":true,"id":"foo2","rev":"1-d7da3cd352ef74f6391cc13601081214"}
> {noformat}
> Get the document to ensure that it was saved properly:
> {noformat}
> curl -X GET http://localhost:5984/foo/foo2
> {"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":""}
> {noformat}
> Create a view that will return that document:
> {noformat}
> $ curl --user user:password -X PUT -d 
> '{"language":"javascript","views":{"v":{"map":"function(doc){if(doc.type===\"foo\")emit(doc._id,doc);}"}}}'
>  http://localhost:5984/foo/_design/bugdemo
> {"ok":true,"id":"_design/bugdemo","rev":"1-817af2dafecb4cf8213aa7063551daac"}
> {noformat}
> Get the document from the view:
> {noformat}
> $ curl -X GET  http://localhost:5984/foo/_design/bugdemo/_view/v
> {"total_rows":1,"offset":0,"rows":[
> {"id":"foo2","key":"foo2","value":{"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"�"}}
> ]}
> {noformat}
> Now we can see that the field {{value}} now contains two characters. The 
> original character as well as {{U+FFFD}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3173) Views return corrupt data for text fields containing non-BMP characters

2016-10-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1554#comment-1554
 ] 

ASF subversion and git services commented on COUCHDB-3173:
--

Commit 37d3778172ca354f124334edf13bc09d9abc28bc in couchdb-couch's branch 
refs/heads/master from [~paul.joseph.davis]
[ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=37d3778 ]

Fix CouchJS character replacement

This was a bad backport from an old bug. We accidentally backed up when
looking at the second half of a surrogate pair. Instead the backup
should only happen when we see a low half of a surrogate pair with no
preceding high half.

COUCHDB-3173


> Views return corrupt data for text fields containing non-BMP characters
> ---
>
> Key: COUCHDB-3173
> URL: https://issues.apache.org/jira/browse/COUCHDB-3173
> Project: CouchDB
>  Issue Type: Bug
>  Components: JavaScript View Server
>Affects Versions: 2.0.0
>Reporter: Loke
>
> When inserting a non-BMP character (i.e. characters with a Unicode codepoint 
> above {{U+}}), the content gets corrupted after reading it from a view. 
> At every instance of such characters, there is an exta {{U+FFFD REPLACEMENT 
> CHARACTER}} inserted into the text.
> To reproduce, use the following commands.
> Create the document containing a field with the character {{U+1F604 SMILING 
> FACE WITH OPEN MOUTH AND SMILING EYES}}:
> {noformat}
> $ curl -X PUT -d '{"type":"foo","value":""}' http://localhost:5984/foo/foo2
> {"ok":true,"id":"foo2","rev":"1-d7da3cd352ef74f6391cc13601081214"}
> {noformat}
> Get the document to ensure that it was saved properly:
> {noformat}
> curl -X GET http://localhost:5984/foo/foo2
> {"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":""}
> {noformat}
> Create a view that will return that document:
> {noformat}
> $ curl --user user:password -X PUT -d 
> '{"language":"javascript","views":{"v":{"map":"function(doc){if(doc.type===\"foo\")emit(doc._id,doc);}"}}}'
>  http://localhost:5984/foo/_design/bugdemo
> {"ok":true,"id":"_design/bugdemo","rev":"1-817af2dafecb4cf8213aa7063551daac"}
> {noformat}
> Get the document from the view:
> {noformat}
> $ curl -X GET  http://localhost:5984/foo/_design/bugdemo/_view/v
> {"total_rows":1,"offset":0,"rows":[
> {"id":"foo2","key":"foo2","value":{"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"�"}}
> ]}
> {noformat}
> Now we can see that the field {{value}} now contains two characters. The 
> original character as well as {{U+FFFD}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3173) Views return corrupt data for text fields containing non-BMP characters

2016-10-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15545538#comment-15545538
 ] 

ASF GitHub Bot commented on COUCHDB-3173:
-

GitHub user davisp opened a pull request:

https://github.com/apache/couchdb-couch/pull/202

Fix CouchJS character replacement

This was a bad backport from an old bug. We accidentally backed up when
looking at the second half of a surrogate pair. Instead the backup
should only happen when we see a low half of a surrogate pair with no
preceding high half.

COUCHDB-3173

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloudant/couchdb-couch 
3173-fix-couchjs-character-replacement

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/couchdb-couch/pull/202.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #202


commit 37d3778172ca354f124334edf13bc09d9abc28bc
Author: Paul J. Davis 
Date:   2016-10-04T14:45:36Z

Fix CouchJS character replacement

This was a bad backport from an old bug. We accidentally backed up when
looking at the second half of a surrogate pair. Instead the backup
should only happen when we see a low half of a surrogate pair with no
preceding high half.

COUCHDB-3173




> Views return corrupt data for text fields containing non-BMP characters
> ---
>
> Key: COUCHDB-3173
> URL: https://issues.apache.org/jira/browse/COUCHDB-3173
> Project: CouchDB
>  Issue Type: Bug
>  Components: JavaScript View Server
>Affects Versions: 2.0.0
>Reporter: Loke
>
> When inserting a non-BMP character (i.e. characters with a Unicode codepoint 
> above {{U+}}), the content gets corrupted after reading it from a view. 
> At every instance of such characters, there is an exta {{U+FFFD REPLACEMENT 
> CHARACTER}} inserted into the text.
> To reproduce, use the following commands.
> Create the document containing a field with the character {{U+1F604 SMILING 
> FACE WITH OPEN MOUTH AND SMILING EYES}}:
> {noformat}
> $ curl -X PUT -d '{"type":"foo","value":""}' http://localhost:5984/foo/foo2
> {"ok":true,"id":"foo2","rev":"1-d7da3cd352ef74f6391cc13601081214"}
> {noformat}
> Get the document to ensure that it was saved properly:
> {noformat}
> curl -X GET http://localhost:5984/foo/foo2
> {"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":""}
> {noformat}
> Create a view that will return that document:
> {noformat}
> $ curl --user user:password -X PUT -d 
> '{"language":"javascript","views":{"v":{"map":"function(doc){if(doc.type===\"foo\")emit(doc._id,doc);}"}}}'
>  http://localhost:5984/foo/_design/bugdemo
> {"ok":true,"id":"_design/bugdemo","rev":"1-817af2dafecb4cf8213aa7063551daac"}
> {noformat}
> Get the document from the view:
> {noformat}
> $ curl -X GET  http://localhost:5984/foo/_design/bugdemo/_view/v
> {"total_rows":1,"offset":0,"rows":[
> {"id":"foo2","key":"foo2","value":{"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"�"}}
> ]}
> {noformat}
> Now we can see that the field {{value}} now contains two characters. The 
> original character as well as {{U+FFFD}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3173) Views return corrupt data for text fields containing non-BMP characters

2016-10-04 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15545522#comment-15545522
 ] 

Paul Joseph Davis commented on COUCHDB-3173:


Fixed. PR incoming.

> Views return corrupt data for text fields containing non-BMP characters
> ---
>
> Key: COUCHDB-3173
> URL: https://issues.apache.org/jira/browse/COUCHDB-3173
> Project: CouchDB
>  Issue Type: Bug
>  Components: JavaScript View Server
>Affects Versions: 2.0.0
>Reporter: Loke
>
> When inserting a non-BMP character (i.e. characters with a Unicode codepoint 
> above {{U+}}), the content gets corrupted after reading it from a view. 
> At every instance of such characters, there is an exta {{U+FFFD REPLACEMENT 
> CHARACTER}} inserted into the text.
> To reproduce, use the following commands.
> Create the document containing a field with the character {{U+1F604 SMILING 
> FACE WITH OPEN MOUTH AND SMILING EYES}}:
> {noformat}
> $ curl -X PUT -d '{"type":"foo","value":""}' http://localhost:5984/foo/foo2
> {"ok":true,"id":"foo2","rev":"1-d7da3cd352ef74f6391cc13601081214"}
> {noformat}
> Get the document to ensure that it was saved properly:
> {noformat}
> curl -X GET http://localhost:5984/foo/foo2
> {"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":""}
> {noformat}
> Create a view that will return that document:
> {noformat}
> $ curl --user user:password -X PUT -d 
> '{"language":"javascript","views":{"v":{"map":"function(doc){if(doc.type===\"foo\")emit(doc._id,doc);}"}}}'
>  http://localhost:5984/foo/_design/bugdemo
> {"ok":true,"id":"_design/bugdemo","rev":"1-817af2dafecb4cf8213aa7063551daac"}
> {noformat}
> Get the document from the view:
> {noformat}
> $ curl -X GET  http://localhost:5984/foo/_design/bugdemo/_view/v
> {"total_rows":1,"offset":0,"rows":[
> {"id":"foo2","key":"foo2","value":{"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"�"}}
> ]}
> {noformat}
> Now we can see that the field {{value}} now contains two characters. The 
> original character as well as {{U+FFFD}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COUCHDB-3173) Views return corrupt data for text fields containing non-BMP characters

2016-10-04 Thread Paul Joseph Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15545511#comment-15545511
 ] 

Paul Joseph Davis commented on COUCHDB-3173:


Here's a simpler reproducer:

https://gist.github.com/davisp/3cc1a0e5b0de04a3c027f694d5a4bc31

The contents of the gist are pasted below for posterity, but I dunno how well 
Jira and Chrome will store the raw byte values:

repro.js:

["reset", {"reduce_limit":"true", "timeout":5000}]
["add_fun", "function(doc){if(doc.type===\"foo\")emit(doc._id,doc);}"]
["map_doc", 
{"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":""}]

run.sh:

cat repro.js | ./bin/couchjs share/server/main.js

Should have a fix in a few minutes if I'm lucky.

> Views return corrupt data for text fields containing non-BMP characters
> ---
>
> Key: COUCHDB-3173
> URL: https://issues.apache.org/jira/browse/COUCHDB-3173
> Project: CouchDB
>  Issue Type: Bug
>  Components: JavaScript View Server
>Affects Versions: 2.0.0
>Reporter: Loke
>
> When inserting a non-BMP character (i.e. characters with a Unicode codepoint 
> above {{U+}}), the content gets corrupted after reading it from a view. 
> At every instance of such characters, there is an exta {{U+FFFD REPLACEMENT 
> CHARACTER}} inserted into the text.
> To reproduce, use the following commands.
> Create the document containing a field with the character {{U+1F604 SMILING 
> FACE WITH OPEN MOUTH AND SMILING EYES}}:
> {noformat}
> $ curl -X PUT -d '{"type":"foo","value":""}' http://localhost:5984/foo/foo2
> {"ok":true,"id":"foo2","rev":"1-d7da3cd352ef74f6391cc13601081214"}
> {noformat}
> Get the document to ensure that it was saved properly:
> {noformat}
> curl -X GET http://localhost:5984/foo/foo2
> {"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":""}
> {noformat}
> Create a view that will return that document:
> {noformat}
> $ curl --user user:password -X PUT -d 
> '{"language":"javascript","views":{"v":{"map":"function(doc){if(doc.type===\"foo\")emit(doc._id,doc);}"}}}'
>  http://localhost:5984/foo/_design/bugdemo
> {"ok":true,"id":"_design/bugdemo","rev":"1-817af2dafecb4cf8213aa7063551daac"}
> {noformat}
> Get the document from the view:
> {noformat}
> $ curl -X GET  http://localhost:5984/foo/_design/bugdemo/_view/v
> {"total_rows":1,"offset":0,"rows":[
> {"id":"foo2","key":"foo2","value":{"_id":"foo2","_rev":"1-d7da3cd352ef74f6391cc13601081214","type":"foo","value":"�"}}
> ]}
> {noformat}
> Now we can see that the field {{value}} now contains two characters. The 
> original character as well as {{U+FFFD}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)