Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-06 Thread Raymond Wiker
Ok, let me rephrase that slightly: does your database extraction include BLOBs or CLOBs that are actually complete documents, that might be UTF-8 encoded text? From the stack trace in your second post, it seems that the error occurs while parsing an XML file uploaded via the UpdateRequestHandler.

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-06 Thread Federico Chiacchiaretta
2013/8/6 Raymond Wiker rwi...@gmail.com Ok, let me rephrase that slightly: does your database extraction include BLOBs or CLOBs that are actually complete documents, that might be UTF-8 encoded text? It definitely does, each entry I have in PostgreSQL has a field of type text that include

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Federico Chiacchiaretta
Hi, I reproduced the bug on solr 4.4.0. The bug is specific to SolrCloud, so the bug occurs only when data has to be forwarded to another node (say I start dataimport on node1 and it forwards data to node2). Here is the log I found on target node: ERROR - 2013-08-05 11:57:48.739;

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Shawn Heisey
On 8/1/2013 7:20 AM, Federico Chiacchiaretta wrote: on data import from a PostgreSQL db, I get the following error in solr.log: ERROR - 2013-08-01 09:51:00.217; org.apache.solr.common.SolrException; shard update error RetryNode:

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Federico Chiacchiaretta
Hi Shawn, thanks for your answer. From the docs you linked i found: This property is only relevent for server versions less than or equal to 7.2. I'm using version 9.1, I gave it a try but unfortunately I had no luck. Besides, I checked encoding settings on DB and it's UTF-8. Please note that

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Raymond Wiker
I think #xfffe is special; it is used as a byte order mark to identify the encoding used. In that case, it should only appear at the beginning of the document. Sent from my iPhone On 5 Aug 2013, at 17:19, Federico Chiacchiaretta federico.c...@gmail.com wrote: Hi Shawn, thanks for your

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Federico Chiacchiaretta
Hi Raymond, I agree with you, 0xfffe is a special character, that is why I was asking how it's handled in solr. In my document, 0xfffe does not appear at the beginning, it's in the content. Just an update about testing I'm doing: in a SolrCloud two shards environment, if I launch dataimport on

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Chris Hostetter
: I agree with you, 0xfffe is a special character, that is why I was asking : how it's handled in solr. : In my document, 0xfffe does not appear at the beginning, it's in the : content. Unless i'm missunderstanding something (and it's very likely that i am)... 0xfffe is not a special character

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Robert Muir
On Mon, Aug 5, 2013 at 11:42 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : I agree with you, 0xfffe is a special character, that is why I was asking : how it's handled in solr. : In my document, 0xfffe does not appear at the beginning, it's in the : content. Unless i'm

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Shawn Heisey
On 8/5/2013 12:12 PM, Federico Chiacchiaretta wrote: Hi Raymond, I agree with you, 0xfffe is a special character, that is why I was asking how it's handled in solr. In my document, 0xfffe does not appear at the beginning, it's in the content. I believe that 0xfffe not a valid UTF-8 character,

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Chris Hostetter
: 0xfffe is not a special character -- it is explicitly *not* a character in : Unicode at all, it is set asside as not a character. specifically so : that the character 0xfeff can be used as a BOM, and if the BOM is read : incorrectly, it will cause an error. : : XML doesnt allow control

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Steve Rowe
Unicode noncharacters are perfectly valid for the purpose of interchange (though as Robert points out, XML has its own ideas about this, separately from the Unicode standard). From http://www.unicode.org/faq/private_user.html: Q: Are noncharacters invalid in Unicode strings and UTFs?

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Robert Muir
On Mon, Aug 5, 2013 at 3:03 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : 0xfffe is not a special character -- it is explicitly *not* a character in : Unicode at all, it is set asside as not a character. specifically so : that the character 0xfeff can be used as a BOM, and if the

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Sundararaju, Shankar
The problem is that even though unicode point \u and \uFFFE are valid UTF-8 characters, they will not be parsed by standards conforming XML parsers. There is something called UTF-8 replacement character \uFFFD that can be used to replace such characters. While indexing docs, replace all such

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Raymond Wiker
On Aug 5, 2013, at 20:12 , Federico Chiacchiaretta federico.c...@gmail.com wrote: Hi Raymond, I agree with you, 0xfffe is a special character, that is why I was asking how it's handled in solr. In my document, 0xfffe does not appear at the beginning, it's in the content. Just an update

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-05 Thread Federico Chiacchiaretta
No, the content has no XML tags included (hope I understood what you were asking here). Federico 2013/8/5 Raymond Wiker rwi...@gmail.com On Aug 5, 2013, at 20:12 , Federico Chiacchiaretta federico.c...@gmail.com wrote: Hi Raymond, I agree with you, 0xfffe is a special character, that is

Invalid UTF-8 character 0xfffe during shard update

2013-08-01 Thread Federico Chiacchiaretta
Hi list, on data import from a PostgreSQL db, I get the following error in solr.log: ERROR - 2013-08-01 09:51:00.217; org.apache.solr.common.SolrException; shard update error RetryNode: http://172.16.201.173:8983/solr/archive/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: