[jira] Updated: (XERCESC-1936) ICUTransService and IconvGNUransService CAN NOT deal with huge file.

2010-09-07 Thread Boris Kolpackov (JIRA)

 [ 
https://issues.apache.org/jira/browse/XERCESC-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Kolpackov updated XERCESC-1936:
-

Fix Version/s: 3.1.2
   3.2.0
   4.0.0

Yes, I just tried your test with ICU and I get the error. Scheduling this bug 
for the next release.

 ICUTransService and IconvGNUransService CAN NOT deal with huge file.
 

 Key: XERCESC-1936
 URL: https://issues.apache.org/jira/browse/XERCESC-1936
 Project: Xerces-C++
  Issue Type: Bug
  Components: Utilities
Affects Versions: 2.8.0, 3.1.1
 Environment: RHEL-5.5
 glibc-2.5-49.el5_5.2
 libicu-3.6-5.11.4
Reporter: kirby zhou
 Fix For: 3.1.2, 3.2.0, 4.0.0


 If a huge file passed to XMLReader, it will call TransService mulitple times, 
 and splite the file content into several fragments.
 Unfortunately, the fragment will contain incomplete multi-byte characters.
 But neither ICUTransService nor IconvGNUransService deal with it. 
 ICUTransService did not deal with U_TRUNCATED_CHAR_FOUND, and 
 IconvGNUransService did not deal with EINVAL.
 Both 2.8.0 and 3.1.1 have the same bug.
 For example, make 2 XML like that:
 ]# ( echo '?xml version=1.0 encoding=GBK ?'; echo 'data'; for 
 ((i=0;i2;++i)); do echo -n '中文汉字A'; done ; echo; echo '/data' )  
 ~/small.xml
 ]# ( echo '?xml version=1.0 encoding=GBK ?'; echo 'data'; for 
 ((i=0;i10;++i)); do echo -n '中文汉字A'; done ; echo; echo '/data' )  
 ~/big.xml
 # the small.xml and big.xml are analogical. 
 ]# samples/SAXPrint -x=gbk ~/small.xml 
 ?xml version=1.0 encoding=gbk?
 data
 中文汉字A中文汉字A
 /data
 # with icu
 ]# samples/SAXPrint -x=gbk ~/big.xml
 ?xml version=1.0 encoding=gbk?
 data
 Fatal Error at file /root/big.xml, line 3, char 16377
   Message: char 0x6C49 is not representable in 'gbk' encoding
 # with iconvgnu
 ]# samples/SAXPrint -x=gbk ~/big.xml
 ]# samples/SAXPrint -x=gbk ~/big.xml 
 ?xml version=1.0 encoding=gbk?
 data
 Fatal Error at file /root/big.xml, line 3, char 16377
   Message: invalid multi-byte sequence

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] Updated: (XERCESC-1936) ICUTransService and IconvGNUransService CAN NOT deal with huge file.

2010-08-03 Thread Boris Kolpackov (JIRA)

 [ 
https://issues.apache.org/jira/browse/XERCESC-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Kolpackov updated XERCESC-1936:
-


Hi,

Can you attach the sample files to the bug report? The content that you have 
pasted in the description is all garbled. Also, would you be able to come up 
with a patch for this issue?

 ICUTransService and IconvGNUransService CAN NOT deal with huge file.
 

 Key: XERCESC-1936
 URL: https://issues.apache.org/jira/browse/XERCESC-1936
 Project: Xerces-C++
  Issue Type: Bug
  Components: Utilities
Affects Versions: 2.8.0, 3.1.1
 Environment: RHEL-5.5
 glibc-2.5-49.el5_5.2
 libicu-3.6-5.11.4
Reporter: kirby zhou

 If a huge file passed to XMLReader, it will call TransService mulitple times, 
 and splite the file content into several fragments.
 Unfortunately, the fragment will contain incomplete multi-byte characters.
 But neither ICUTransService nor IconvGNUransService deal with it. 
 ICUTransService did not deal with U_TRUNCATED_CHAR_FOUND, and 
 IconvGNUransService did not deal with EINVAL.
 Both 2.8.0 and 3.1.1 have the same bug.
 For example, make 2 XML like that:
 ]# ( echo '?xml version=1.0 encoding=GBK ?'; echo 'data'; for 
 ((i=0;i2;++i)); do echo -n '中文汉字A'; done ; echo; echo '/data' )  
 ~/small.xml
 ]# ( echo '?xml version=1.0 encoding=GBK ?'; echo 'data'; for 
 ((i=0;i10;++i)); do echo -n '中文汉字A'; done ; echo; echo '/data' )  
 ~/big.xml
 # the small.xml and big.xml are analogical. 
 ]# samples/SAXPrint -x=gbk ~/small.xml 
 ?xml version=1.0 encoding=gbk?
 data
 中文汉字A中文汉字A
 /data
 # with icu
 ]# samples/SAXPrint -x=gbk ~/big.xml
 ?xml version=1.0 encoding=gbk?
 data
 Fatal Error at file /root/big.xml, line 3, char 16377
   Message: char 0x6C49 is not representable in 'gbk' encoding
 # with iconvgnu
 ]# samples/SAXPrint -x=gbk ~/big.xml
 ]# samples/SAXPrint -x=gbk ~/big.xml 
 ?xml version=1.0 encoding=gbk?
 data
 Fatal Error at file /root/big.xml, line 3, char 16377
   Message: invalid multi-byte sequence

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org