Re: Fw: TolerantUpdateProcessorFactory not functioning

2020-06-10 Thread Hup Chen

There was another error which I think it should be an indexing error.
The listprice below is a pdouble filed, the update process didn't ignore the 
error when it was sent wrong data.

Response: {
  "responseHeader":{
"status":400,
"QTime":133551},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","java.lang.NumberFormatException"],
"msg":"ERROR: [doc=978194537913] Error adding field 
'listprice'='106Chapter' msg=For input string: \"106Chapter\"",
"code":400}}


________
From: Shawn Heisey 
Sent: Tuesday, June 9, 2020 3:19 PM
To: solr-user@lucene.apache.org 
Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning

On 6/9/2020 12:44 AM, Hup Chen wrote:
> Thanks for your reply, this is one of the example where it fail.  POST by 
> using  charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error 
> found in the title field,  I hope solr can simply skip this record and go 
> ahead to index the rest data.
>
> 
> 
>   9780373773244
>   9780373773244
> Missing: Innocent By Association^Zachary's Law (Hqn 
> Romance) 
>   Lisa_Jackson 
> 
> 
>
> curl 
> "http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain=100;
>  -H 'Content-Type: text/xml; charset=utf-8' -d @data
>
>
> 
> 
>
> 
>
>100
>400
>0
> 
> 
>
>  org.apache.solr.common.SolrException
>   name="root-error-class">com.ctc.wstx.exc.WstxUnexpectedCharException
>
>Illegal character ((CTRL-CHAR, code 26))
>   at [row,col {unknown-source}]: [1,225]
>400
> 
> 

I tried your example XML as it is shown in your original message, saved
to a file named "foo.xml", and didn't have any trouble.  I wasn't even
using the tolerant update processor.   I just fired up the techproducts
example on a solr-8.3.0 download I already had, added a field named
"isbn13" (string type) so the schema was compatible, and tried the
following command:

curl "http://localhost:8983/solr/techproducts/update; -H 'Content-Type:
text/xml; charset=utf-8' -d @foo.xml

I then tried it again with the ^Z (which is two characters) replaced by
an actual Ctrl-Z character.  When I did that, I got exactly the same
error you did.

A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML,
which is why you're getting the error.

The tolerant update processor can't ignore errors in the actual format
of the input ... it only ignores errors during *indexing*.  This error
occurred during the input parsing, not during indexing, so the update
processor could not ignore it.

Thanks,
Shawn


Re: Fw: TolerantUpdateProcessorFactory not functioning

2020-06-09 Thread Hup Chen
Oh I got it, that's not indexing error!
Seem like I need to remove all the characters between [\x0-\x1F] (except \x9 
TAB, \xA LF, \xD CR) first.

Thanks a lot!





From: Shawn Heisey 
Sent: Tuesday, June 9, 2020 3:19 PM
To: solr-user@lucene.apache.org 
Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning


I tried your example XML as it is shown in your original message, saved
to a file named "foo.xml", and didn't have any trouble.  I wasn't even
using the tolerant update processor.   I just fired up the techproducts
example on a solr-8.3.0 download I already had, added a field named
"isbn13" (string type) so the schema was compatible, and tried the
following command:

curl "http://localhost:8983/solr/techproducts/update; -H 'Content-Type:
text/xml; charset=utf-8' -d @foo.xml

I then tried it again with the ^Z (which is two characters) replaced by
an actual Ctrl-Z character.  When I did that, I got exactly the same
error you did.

A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML,
which is why you're getting the error.

The tolerant update processor can't ignore errors in the actual format
of the input ... it only ignores errors during *indexing*.  This error
occurred during the input parsing, not during indexing, so the update
processor could not ignore it.

Thanks,
Shawn


Re: Fw: TolerantUpdateProcessorFactory not functioning

2020-06-09 Thread Shawn Heisey

On 6/9/2020 12:44 AM, Hup Chen wrote:

Thanks for your reply, this is one of the example where it fail.  POST by using  
charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error found in 
the title field,  I hope solr can simply skip this record and go ahead to index the rest 
data.



  9780373773244
  9780373773244
Missing: Innocent By Association^Zachary's Law (Hqn Romance) 

  Lisa_Jackson 



curl 
"http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain=100;
 -H 'Content-Type: text/xml; charset=utf-8' -d @data






   
   100
   400
   0


   
 org.apache.solr.common.SolrException
 com.ctc.wstx.exc.WstxUnexpectedCharException
   
   Illegal character ((CTRL-CHAR, code 26))
  at [row,col {unknown-source}]: [1,225]
   400




I tried your example XML as it is shown in your original message, saved 
to a file named "foo.xml", and didn't have any trouble.  I wasn't even 
using the tolerant update processor.   I just fired up the techproducts 
example on a solr-8.3.0 download I already had, added a field named 
"isbn13" (string type) so the schema was compatible, and tried the 
following command:


curl "http://localhost:8983/solr/techproducts/update; -H 'Content-Type: 
text/xml; charset=utf-8' -d @foo.xml


I then tried it again with the ^Z (which is two characters) replaced by 
an actual Ctrl-Z character.  When I did that, I got exactly the same 
error you did.


A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML, 
which is why you're getting the error.


The tolerant update processor can't ignore errors in the actual format 
of the input ... it only ignores errors during *indexing*.  This error 
occurred during the input parsing, not during indexing, so the update 
processor could not ignore it.


Thanks,
Shawn


Re: Fw: TolerantUpdateProcessorFactory not functioning

2020-06-09 Thread Hup Chen
Thanks for your reply, this is one of the example where it fail.  POST by using 
 charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error found in 
the title field,  I hope solr can simply skip this record and go ahead to index 
the rest data.



 9780373773244
 9780373773244
Missing: Innocent By Association^Zachary's Law (Hqn 
Romance) 
 Lisa_Jackson 





curl 
"http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain=100;
 -H 'Content-Type: text/xml; charset=utf-8' -d @data






  
  100
  400
  0


  
org.apache.solr.common.SolrException
com.ctc.wstx.exc.WstxUnexpectedCharException
  
  Illegal character ((CTRL-CHAR, code 26))
 at [row,col {unknown-source}]: [1,225]
  400




From: Thomas Corthals 
Sent: Tuesday, June 9, 2020 2:12 PM
To: solr-user@lucene.apache.org 
Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning

If your XML or JSON can't be parsed, your content never makes it to the
update chain.

It looks like you're trying to index non-UTF-8 data. You can set the
encoding of your XML in the Content-Type header of your POST request.

-H 'Content-Type: text/xml; charset=GB18030'

JSON only allows UTF-8, UTF-16 or UTF-32.

Best,

Thomas

Op di 9 jun. 2020 07:11 schreef Hup Chen :

> Any idea?
> I still won't be able to get TolerantUpdateProcessorFactory working, solr
> exited at any error without any tolerance, any suggestions will be
> appreciated.
> curl "
> http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain=100;
> -d @data.xml
>
> 
> 
>
> 
>   
>   100
>   400
>   1
> 
> 
>   
> org.apache.solr.common.SolrException
> com.ctc.wstx.exc.WstxEOFException
>   
>   Unexpected EOF; was expecting a close tag for element
> field
>  at [row,col {unknown-source}]: [1,8191]
>   400
> 
> 
>
>
> 
> From: Hup Chen
> Sent: Friday, May 29, 2020 7:29 PM
> To: solr-user@lucene.apache.org 
> Subject: TolerantUpdateProcessorFactory not functioning
>
> Hi,
>
> My solr indexing did not tolerate bad record but simply exited even I have
> configured TolerantUpdateProcessorFactory  in solrconfig.xml.
> Please advise how could I get TolerantUpdateProcessorFactory  to be
> working?
>
> solrconfig.xml:
>
>  
>
>  100
>
>
>  
>
> restarted solr before indexing:
> service solr stop
> service solr start
>
> curl "
> http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain=100;
> -d @test.json
>
> The first record is a bad record in test.json, the rest were not indexed.
>
> {
>   "responseHeader":{
> "errors":[{
> "type":"ADD",
> "id":"0007264097",
> "message":"ERROR: [doc=0007264097] Error adding field
> 'usedshipping'='' msg=empty String"}],
> "maxErrors":100,
> "status":400,
> "QTime":0},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Cannot parse provided JSON: Expected key,value separator ':':
> char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\",
> \"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko
> OÃtomo\", ãã, \"ima'",
> "code":400}}
>
>


Re: Fw: TolerantUpdateProcessorFactory not functioning

2020-06-09 Thread Thomas Corthals
If your XML or JSON can't be parsed, your content never makes it to the
update chain.

It looks like you're trying to index non-UTF-8 data. You can set the
encoding of your XML in the Content-Type header of your POST request.

-H 'Content-Type: text/xml; charset=GB18030'

JSON only allows UTF-8, UTF-16 or UTF-32.

Best,

Thomas

Op di 9 jun. 2020 07:11 schreef Hup Chen :

> Any idea?
> I still won't be able to get TolerantUpdateProcessorFactory working, solr
> exited at any error without any tolerance, any suggestions will be
> appreciated.
> curl "
> http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain=100;
> -d @data.xml
>
> 
> 
>
> 
>   
>   100
>   400
>   1
> 
> 
>   
> org.apache.solr.common.SolrException
> com.ctc.wstx.exc.WstxEOFException
>   
>   Unexpected EOF; was expecting a close tag for element
> field
>  at [row,col {unknown-source}]: [1,8191]
>   400
> 
> 
>
>
> 
> From: Hup Chen
> Sent: Friday, May 29, 2020 7:29 PM
> To: solr-user@lucene.apache.org 
> Subject: TolerantUpdateProcessorFactory not functioning
>
> Hi,
>
> My solr indexing did not tolerate bad record but simply exited even I have
> configured TolerantUpdateProcessorFactory  in solrconfig.xml.
> Please advise how could I get TolerantUpdateProcessorFactory  to be
> working?
>
> solrconfig.xml:
>
>  
>
>  100
>
>
>  
>
> restarted solr before indexing:
> service solr stop
> service solr start
>
> curl "
> http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain=100;
> -d @test.json
>
> The first record is a bad record in test.json, the rest were not indexed.
>
> {
>   "responseHeader":{
> "errors":[{
> "type":"ADD",
> "id":"0007264097",
> "message":"ERROR: [doc=0007264097] Error adding field
> 'usedshipping'='' msg=empty String"}],
> "maxErrors":100,
> "status":400,
> "QTime":0},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Cannot parse provided JSON: Expected key,value separator ':':
> char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\",
> \"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko
> OÃtomo\", ãã, \"ima'",
> "code":400}}
>
>


Fw: TolerantUpdateProcessorFactory not functioning

2020-06-08 Thread Hup Chen
Any idea?
I still won't be able to get TolerantUpdateProcessorFactory working, solr 
exited at any error without any tolerance, any suggestions will be appreciated.
curl 
"http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain=100;
 -d @data.xml





  
  100
  400
  1


  
org.apache.solr.common.SolrException
com.ctc.wstx.exc.WstxEOFException
  
  Unexpected EOF; was expecting a close tag for element 
field
 at [row,col {unknown-source}]: [1,8191]
  400





From: Hup Chen
Sent: Friday, May 29, 2020 7:29 PM
To: solr-user@lucene.apache.org 
Subject: TolerantUpdateProcessorFactory not functioning

Hi,

My solr indexing did not tolerate bad record but simply exited even I have 
configured TolerantUpdateProcessorFactory  in solrconfig.xml.
Please advise how could I get TolerantUpdateProcessorFactory  to be working?

solrconfig.xml:

 
   
 100
   
   
 

restarted solr before indexing:
service solr stop
service solr start

curl 
"http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain=100;
 -d @test.json

The first record is a bad record in test.json, the rest were not indexed.

{
  "responseHeader":{
"errors":[{
"type":"ADD",
"id":"0007264097",
"message":"ERROR: [doc=0007264097] Error adding field 'usedshipping'='' 
msg=empty String"}],
"maxErrors":100,
"status":400,
"QTime":0},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"Cannot parse provided JSON: Expected key,value separator ':': 
char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\", 
\"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko OÃtomo\", 
ãã, \"ima'",
"code":400}}