On Tue, Jun 30, 2009 at 8:18 PM, Nitin Borwankar<[email protected]> wrote: > Hi Damien, > > Thanks for that tip. > > Turns out I had non-UTF-8 data > > adolfo.steiger-gar%E7%E3o: > > - not sure how it managed to get into the db. > > This is probably confusing the chunk termination. > > How did Couch let this data in ?
Currently CouchDB doesn't validate json string contents on input, only on output. Adding an option to block invalid unicode input would be a small patch, but perhaps slow things down as we'd have to spend more time in the encoder while writing. Worth measuring I suppose. Is this something users are running into a lot? I've heard this once before, if lots of people are seeing this, it's definitely worthy of fixing. I uploaded via Python httplib - not > couchdb-python. Is this a bug - the one that is fixed in 0.9.1? > > Nitin > > 37% of all statistics are made up on the spot > ------------------------------------------------------------------------------------- > Nitin Borwankar > [email protected] > > > On Tue, Jun 30, 2009 at 8:58 AM, Damien Katz <[email protected]> wrote: > >> This might be the json encoding issue that Adam fixed. >> >> The 0.9.x branch, which is soon to be 0.9.1, fixes that issue. Try building >> and installing from the branch and see if that fixes the problem: >> svn co http://svn.apache.org/repos/asf/couchdb/branches/0.9.x/ >> >> -Damien >> >> >> >> On Jun 30, 2009, at 12:15 AM, Nitin Borwankar wrote: >> >> Oh and when I use Futon and try to browse the docs around where curl >>> gives >>> an error, when I hit the page containing the records around the error >>> Futon >>> just spins and doesn't render the page. >>> >>> Data corruption? >>> >>> Nitin >>> >>> 37% of all statistics are made up on the spot >>> >>> ------------------------------------------------------------------------------------- >>> Nitin Borwankar >>> [email protected] >>> >>> >>> On Mon, Jun 29, 2009 at 9:11 PM, Nitin Borwankar <[email protected] >>> >wrote: >>> >>> >>>> Hi, >>>> >>>> I uploaded about 11K + docs total 230MB or so of data to a 0.9 instance >>>> on >>>> Ubuntu. >>>> Db name is 'plist' >>>> >>>> curl http://localhost:5984/plist gives >>>> >>>> >>>> >>>> {"db_name":"plist","doc_count":11036,"doc_del_count":0,"update_seq":11036,"purge_seq":0, >>>> >>>> >>>> "compact_running":false,"disk_size":243325178,"instance_start_time":"1246228896723181"} >>>> >>>> suggesting a non-corrupt db >>>> >>>> curl http://localhost:5984/plist/_all_docs gives >>>> >>>> {"id":"adnanmoh","key":"adnanmoh","value":{"rev":"1-663736558"}}, >>>> >>>> >>>> {"id":"adnen.chockri","key":"adnen.chockri","value":{"rev":"1-1209124545"}}, >>>> curl: (56) Received problem 2 in the chunky >>>> parser <<--------- note curl >>>> error >>>> {"id":"ado.adamu","key":"ado.adamu","value":{"rev":"1-4226951654"}} >>>> >>>> suggesting a chunked data transfer error >>>> >>>> >>>> couchdb-lucene error message in couchdb.stderr reads >>>> >>>> [...] >>>> >>>> [couchdb-lucene] INFO Indexing plist from scratch. >>>> [couchdb-lucene] ERROR Error updating index. >>>> java.io.IOException: CRLF expected at end of chunk: 83/101 >>>> at >>>> >>>> org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:207) >>>> at >>>> >>>> org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219) >>>> at >>>> >>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176) >>>> at >>>> >>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196) >>>> at >>>> >>>> org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369) >>>> at >>>> >>>> org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346) >>>> at java.io.FilterInputStream.close(FilterInputStream.java:159) >>>> at >>>> >>>> org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:194) >>>> at >>>> >>>> org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158) >>>> at >>>> com.github.rnewson.couchdb.lucene.Database.execute(Database.java:141) >>>> at com.github.rnewson.couchdb.lucene.Database.get(Database.java:107) >>>> at >>>> >>>> com.github.rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java:82) >>>> at >>>> >>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateDatabase(Index.java:229) >>>> at >>>> >>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateIndex(Index.java:178) >>>> at com.github.rnewson.couchdb.lucene.Index$Indexer.run(Index.java:90) >>>> at java.lang.Thread.run(Thread.java:595) >>>> >>>> >>>> suggesting a chunking problem again. >>>> >>>> Who is creating this problem - my data? CouchDB chunking ? >>>> >>>> Help? >>>> >>>> >>>> >>>> 37% of all statistics are made up on the spot >>>> >>>> >>>> ------------------------------------------------------------------------------------- >>>> Nitin Borwankar >>>> [email protected] >>>> >>>> >> > -- Chris Anderson http://jchrisa.net http://couch.io
