Re: Fw: TolerantUpdateProcessorFactory not functioning
There was another error which I think it should be an indexing error. The listprice below is a pdouble filed, the update process didn't ignore the error when it was sent wrong data. Response: { "responseHeader":{ "status":400, "QTime":133551}, "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","java.lang.NumberFormatException"], "msg":"ERROR: [doc=978194537913] Error adding field 'listprice'='106Chapter' msg=For input string: \"106Chapter\"", "code":400}} ________ From: Shawn Heisey Sent: Tuesday, June 9, 2020 3:19 PM To: solr-user@lucene.apache.org Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning On 6/9/2020 12:44 AM, Hup Chen wrote: > Thanks for your reply, this is one of the example where it fail. POST by > using charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error > found in the title field, I hope solr can simply skip this record and go > ahead to index the rest data. > > > > 9780373773244 > 9780373773244 > Missing: Innocent By Association^Zachary's Law (Hqn > Romance) > Lisa_Jackson > > > > curl > "http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain=100; > -H 'Content-Type: text/xml; charset=utf-8' -d @data > > > > > > > >100 >400 >0 > > > > org.apache.solr.common.SolrException > name="root-error-class">com.ctc.wstx.exc.WstxUnexpectedCharException > >Illegal character ((CTRL-CHAR, code 26)) > at [row,col {unknown-source}]: [1,225] >400 > > I tried your example XML as it is shown in your original message, saved to a file named "foo.xml", and didn't have any trouble. I wasn't even using the tolerant update processor. I just fired up the techproducts example on a solr-8.3.0 download I already had, added a field named "isbn13" (string type) so the schema was compatible, and tried the following command: curl "http://localhost:8983/solr/techproducts/update; -H 'Content-Type: text/xml; charset=utf-8' -d @foo.xml I then tried it again with the ^Z (which is two characters) replaced by an actual Ctrl-Z character. When I did that, I got exactly the same error you did. A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML, which is why you're getting the error. The tolerant update processor can't ignore errors in the actual format of the input ... it only ignores errors during *indexing*. This error occurred during the input parsing, not during indexing, so the update processor could not ignore it. Thanks, Shawn
Re: Fw: TolerantUpdateProcessorFactory not functioning
Oh I got it, that's not indexing error! Seem like I need to remove all the characters between [\x0-\x1F] (except \x9 TAB, \xA LF, \xD CR) first. Thanks a lot! From: Shawn Heisey Sent: Tuesday, June 9, 2020 3:19 PM To: solr-user@lucene.apache.org Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning I tried your example XML as it is shown in your original message, saved to a file named "foo.xml", and didn't have any trouble. I wasn't even using the tolerant update processor. I just fired up the techproducts example on a solr-8.3.0 download I already had, added a field named "isbn13" (string type) so the schema was compatible, and tried the following command: curl "http://localhost:8983/solr/techproducts/update; -H 'Content-Type: text/xml; charset=utf-8' -d @foo.xml I then tried it again with the ^Z (which is two characters) replaced by an actual Ctrl-Z character. When I did that, I got exactly the same error you did. A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML, which is why you're getting the error. The tolerant update processor can't ignore errors in the actual format of the input ... it only ignores errors during *indexing*. This error occurred during the input parsing, not during indexing, so the update processor could not ignore it. Thanks, Shawn
Re: Fw: TolerantUpdateProcessorFactory not functioning
On 6/9/2020 12:44 AM, Hup Chen wrote: Thanks for your reply, this is one of the example where it fail. POST by using charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error found in the title field, I hope solr can simply skip this record and go ahead to index the rest data. 9780373773244 9780373773244 Missing: Innocent By Association^Zachary's Law (Hqn Romance) Lisa_Jackson curl "http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain=100; -H 'Content-Type: text/xml; charset=utf-8' -d @data 100 400 0 org.apache.solr.common.SolrException com.ctc.wstx.exc.WstxUnexpectedCharException Illegal character ((CTRL-CHAR, code 26)) at [row,col {unknown-source}]: [1,225] 400 I tried your example XML as it is shown in your original message, saved to a file named "foo.xml", and didn't have any trouble. I wasn't even using the tolerant update processor. I just fired up the techproducts example on a solr-8.3.0 download I already had, added a field named "isbn13" (string type) so the schema was compatible, and tried the following command: curl "http://localhost:8983/solr/techproducts/update; -H 'Content-Type: text/xml; charset=utf-8' -d @foo.xml I then tried it again with the ^Z (which is two characters) replaced by an actual Ctrl-Z character. When I did that, I got exactly the same error you did. A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML, which is why you're getting the error. The tolerant update processor can't ignore errors in the actual format of the input ... it only ignores errors during *indexing*. This error occurred during the input parsing, not during indexing, so the update processor could not ignore it. Thanks, Shawn
Re: Fw: TolerantUpdateProcessorFactory not functioning
Thanks for your reply, this is one of the example where it fail. POST by using charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error found in the title field, I hope solr can simply skip this record and go ahead to index the rest data. 9780373773244 9780373773244 Missing: Innocent By Association^Zachary's Law (Hqn Romance) Lisa_Jackson curl "http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain=100; -H 'Content-Type: text/xml; charset=utf-8' -d @data 100 400 0 org.apache.solr.common.SolrException com.ctc.wstx.exc.WstxUnexpectedCharException Illegal character ((CTRL-CHAR, code 26)) at [row,col {unknown-source}]: [1,225] 400 From: Thomas Corthals Sent: Tuesday, June 9, 2020 2:12 PM To: solr-user@lucene.apache.org Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning If your XML or JSON can't be parsed, your content never makes it to the update chain. It looks like you're trying to index non-UTF-8 data. You can set the encoding of your XML in the Content-Type header of your POST request. -H 'Content-Type: text/xml; charset=GB18030' JSON only allows UTF-8, UTF-16 or UTF-32. Best, Thomas Op di 9 jun. 2020 07:11 schreef Hup Chen : > Any idea? > I still won't be able to get TolerantUpdateProcessorFactory working, solr > exited at any error without any tolerance, any suggestions will be > appreciated. > curl " > http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain=100; > -d @data.xml > > > > > > > 100 > 400 > 1 > > > > org.apache.solr.common.SolrException > com.ctc.wstx.exc.WstxEOFException > > Unexpected EOF; was expecting a close tag for element > field > at [row,col {unknown-source}]: [1,8191] > 400 > > > > > > From: Hup Chen > Sent: Friday, May 29, 2020 7:29 PM > To: solr-user@lucene.apache.org > Subject: TolerantUpdateProcessorFactory not functioning > > Hi, > > My solr indexing did not tolerate bad record but simply exited even I have > configured TolerantUpdateProcessorFactory in solrconfig.xml. > Please advise how could I get TolerantUpdateProcessorFactory to be > working? > > solrconfig.xml: > > > > 100 > > > > > restarted solr before indexing: > service solr stop > service solr start > > curl " > http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain=100; > -d @test.json > > The first record is a bad record in test.json, the rest were not indexed. > > { > "responseHeader":{ > "errors":[{ > "type":"ADD", > "id":"0007264097", > "message":"ERROR: [doc=0007264097] Error adding field > 'usedshipping'='' msg=empty String"}], > "maxErrors":100, > "status":400, > "QTime":0}, > "error":{ > "metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","org.apache.solr.common.SolrException"], > "msg":"Cannot parse provided JSON: Expected key,value separator ':': > char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\", > \"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko > OÃtomo\", ãã, \"ima'", > "code":400}} > >
Re: Fw: TolerantUpdateProcessorFactory not functioning
If your XML or JSON can't be parsed, your content never makes it to the update chain. It looks like you're trying to index non-UTF-8 data. You can set the encoding of your XML in the Content-Type header of your POST request. -H 'Content-Type: text/xml; charset=GB18030' JSON only allows UTF-8, UTF-16 or UTF-32. Best, Thomas Op di 9 jun. 2020 07:11 schreef Hup Chen : > Any idea? > I still won't be able to get TolerantUpdateProcessorFactory working, solr > exited at any error without any tolerance, any suggestions will be > appreciated. > curl " > http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain=100; > -d @data.xml > > > > > > > 100 > 400 > 1 > > > > org.apache.solr.common.SolrException > com.ctc.wstx.exc.WstxEOFException > > Unexpected EOF; was expecting a close tag for element > field > at [row,col {unknown-source}]: [1,8191] > 400 > > > > > > From: Hup Chen > Sent: Friday, May 29, 2020 7:29 PM > To: solr-user@lucene.apache.org > Subject: TolerantUpdateProcessorFactory not functioning > > Hi, > > My solr indexing did not tolerate bad record but simply exited even I have > configured TolerantUpdateProcessorFactory in solrconfig.xml. > Please advise how could I get TolerantUpdateProcessorFactory to be > working? > > solrconfig.xml: > > > > 100 > > > > > restarted solr before indexing: > service solr stop > service solr start > > curl " > http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain=100; > -d @test.json > > The first record is a bad record in test.json, the rest were not indexed. > > { > "responseHeader":{ > "errors":[{ > "type":"ADD", > "id":"0007264097", > "message":"ERROR: [doc=0007264097] Error adding field > 'usedshipping'='' msg=empty String"}], > "maxErrors":100, > "status":400, > "QTime":0}, > "error":{ > "metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","org.apache.solr.common.SolrException"], > "msg":"Cannot parse provided JSON: Expected key,value separator ':': > char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\", > \"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko > OÃtomo\", ãã, \"ima'", > "code":400}} > >
Fw: TolerantUpdateProcessorFactory not functioning
Any idea? I still won't be able to get TolerantUpdateProcessorFactory working, solr exited at any error without any tolerance, any suggestions will be appreciated. curl "http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain=100; -d @data.xml 100 400 1 org.apache.solr.common.SolrException com.ctc.wstx.exc.WstxEOFException Unexpected EOF; was expecting a close tag for element field at [row,col {unknown-source}]: [1,8191] 400 From: Hup Chen Sent: Friday, May 29, 2020 7:29 PM To: solr-user@lucene.apache.org Subject: TolerantUpdateProcessorFactory not functioning Hi, My solr indexing did not tolerate bad record but simply exited even I have configured TolerantUpdateProcessorFactory in solrconfig.xml. Please advise how could I get TolerantUpdateProcessorFactory to be working? solrconfig.xml: 100 restarted solr before indexing: service solr stop service solr start curl "http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain=100; -d @test.json The first record is a bad record in test.json, the rest were not indexed. { "responseHeader":{ "errors":[{ "type":"ADD", "id":"0007264097", "message":"ERROR: [doc=0007264097] Error adding field 'usedshipping'='' msg=empty String"}], "maxErrors":100, "status":400, "QTime":0}, "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException"], "msg":"Cannot parse provided JSON: Expected key,value separator ':': char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\", \"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko OÃtomo\", ãã, \"ima'", "code":400}}