Re: [basex-talk] Validation Module: validate:xsd-report( ) improvement

2018-04-19 Thread Yitzhak Khabinsky
Hi Christian,

The proposed solution works much faster.
It took 11 seconds to run on machine and completely ok.

Thanks a lot.
 
Regards,
Yitzhak Khabinsky
Technical Services Lead 
Millicom International Services LLC
396 Alhambra Circle, Suite 1100
Coral Gables, FL  33134
Skype4B: +1 (305) 445-4172
Tel: (954) 684-8673
yitzhak.khabin...@millicom.com
www.millicom.com

-Original Message-
From: Christian Grün [mailto:christian.gr...@gmail.com] 
Sent: Thursday, April 19, 2018 4:50 PM
To: Yitzhak Khabinsky <yitzhak.khabin...@millicom.com>
Cc: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Validation Module: validate:xsd-report( ) improvement

Hi Yitzhak,

Thanks for your suggestion. There are two reasons why we’ll probably need to 
stick with the existing output format, though:

• Changing the format would introduce incompatibilities with previous versions 
(this is something we only do when switching to new major versions).
• More importantly, an XML file might import other documents (e.g. via 
XInclude), so it will not be guaranteed that the URL is always identical.

I wanted to propose a similar solution as Marco did. I would have expected it 
to be a bit faster, but it’s true that you can save a lot of time if the number 
of nodes to be deleted or inserted is that large. The following query creates a 
report with 1 million message elements. It takes 6,5 seconds on my machine (I 
think this should be
ok):

  let $report :=

  invalid
  {
for $in 1 to 100
return blablablabla
  }

  let $report := element { node-name($report) } {
$report/* ! element { node-name() } { @* except @url, text() }
  }
  return file:write(file:base-dir() || 'report.xml', $report)

Hope this helps,
Christian





On Thu, Apr 19, 2018 at 7:20 PM, Yitzhak Khabinsky 
<yitzhak.khabin...@millicom.com> wrote:
> Hello,
>
>
>
> I am successfully using BaseX Validation Module.
>
> Along the following lines:
>
> let $xml :=
> 'd:\Temp\CDW\HOME\id4879_BO201801_HomeSubscriberMovementFact.xml'
>
> let $xsd := 'd:\Temp\CDW\HOME\HomeSubscriberMovementFact.xsd'
>
> return validate:xsd-report($xml, $xsd, '1.1')
>
>
>
> My XML files have multi-megabyte size and lots of validation errors. 
> In tens or hundreds of thousands of errors.
>
> Behind the scenes, Saxon validator 9.8.0.11 is running.
>
>
>
> Unfortunately, the output structure contains a repeating url attribute.
>
> The BaseX output pane cannot present all the errors.
>
> It says: “(Chopped) Results”.
>
>
>
> 
> invalid
>  url="file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFa
> ct.xml">The content "N/A" of element CommercialServiceCode 
> does not match the required simple type. Value "N/A" contravenes the 
> enumeration facet "R60080-X00162, R60080-X00163, ..." of the type 
> Q{http://www.millicom.com}CommercialServiceCodeType
>  url="file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFa
> ct.xml">The content "TBD" of element MovementTechnology does 
> not match the required simple type. Value "TBD" contravenes the 
> enumeration facet "N/A, HFC, GPON, MMDS, FIBER, C..." of the type 
> Q{http://www.millicom.com}MovementTechnologyType
>  url="file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFa
> ct.xml">The content "-1." of element DownloadSpeed does 
> not match the required simple type. Value "-1" contravenes the 
> minExclusive facet "0" of the type 
> Q{http://www.millicom.com}DownloadSpeedType
>  url="file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFa
> ct.xml">The 7th field in constraint {PK} has no value
> ...
> 
>
>
>
> My proposal is to eliminate the repeated url attribute from the each 
> message and elevate it to its own element just once under the root report tag.
>
> Along the following output structure:
>
>
> 
> invalid
>
> file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFact.xml
> The content "N/A" of 
> element CommercialServiceCode does not match the required 
> simple type. Value "N/A" contravenes the enumeration facet 
> "R60080-X00162, R60080-X00163, ..." of the type 
> Q{http://www.millicom.com}CommercialServiceCodeType
> The content "TBD" of 
> element MovementTechnology does not match the required simple type.
> Value "TBD" contravenes the enumeration facet "N/A, HFC, GPON, MMDS, 
> FIBER, C..." of the type 
> Q{http://www.millicom.com}MovementTechnologyType
> The content "-1." 
> of element DownloadSpeed does not match the required si

Re: [basex-talk] Validation Module: validate:xsd-report( ) improvement

2018-04-19 Thread Christian Grün
Hi Yitzhak,

Thanks for your suggestion. There are two reasons why we’ll probably
need to stick with the existing output format, though:

• Changing the format would introduce incompatibilities with previous
versions (this is something we only do when switching to new major
versions).
• More importantly, an XML file might import other documents (e.g. via
XInclude), so it will not be guaranteed that the URL is always
identical.

I wanted to propose a similar solution as Marco did. I would have
expected it to be a bit faster, but it’s true that you can save a lot
of time if the number of nodes to be deleted or inserted is that
large. The following query creates a report with 1 million message
elements. It takes 6,5 seconds on my machine (I think this should be
ok):

  let $report :=

  invalid
  {
for $in 1 to 100
return blablablabla
  }

  let $report := element { node-name($report) } {
$report/* ! element { node-name() } { @* except @url, text() }
  }
  return file:write(file:base-dir() || 'report.xml', $report)

Hope this helps,
Christian





On Thu, Apr 19, 2018 at 7:20 PM, Yitzhak Khabinsky
 wrote:
> Hello,
>
>
>
> I am successfully using BaseX Validation Module.
>
> Along the following lines:
>
> let $xml :=
> 'd:\Temp\CDW\HOME\id4879_BO201801_HomeSubscriberMovementFact.xml'
>
> let $xsd := 'd:\Temp\CDW\HOME\HomeSubscriberMovementFact.xsd'
>
> return validate:xsd-report($xml, $xsd, '1.1')
>
>
>
> My XML files have multi-megabyte size and lots of validation errors. In tens
> or hundreds of thousands of errors.
>
> Behind the scenes, Saxon validator 9.8.0.11 is running.
>
>
>
> Unfortunately, the output structure contains a repeating url attribute.
>
> The BaseX output pane cannot present all the errors.
>
> It says: “(Chopped) Results”.
>
>
>
> 
> invalid
>  url="file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFact.xml">The
> content "N/A" of element CommercialServiceCode does not match the
> required simple type. Value "N/A" contravenes the enumeration facet
> "R60080-X00162, R60080-X00163, ..." of the type
> Q{http://www.millicom.com}CommercialServiceCodeType
>  url="file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFact.xml">The
> content "TBD" of element MovementTechnology does not match the
> required simple type. Value "TBD" contravenes the enumeration facet "N/A,
> HFC, GPON, MMDS, FIBER, C..." of the type
> Q{http://www.millicom.com}MovementTechnologyType
>  url="file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFact.xml">The
> content "-1." of element DownloadSpeed does not match the
> required simple type. Value "-1" contravenes the minExclusive facet "0" of
> the type Q{http://www.millicom.com}DownloadSpeedType
>  url="file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFact.xml">The
> 7th field in constraint {PK} has no value
> ...
> 
>
>
>
> My proposal is to eliminate the repeated url attribute from the each message
> and elevate it to its own element just once under the root report tag.
>
> Along the following output structure:
>
>
> 
> invalid
>
> file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFact.xml
> The content "N/A" of
> element CommercialServiceCode does not match the required simple
> type. Value "N/A" contravenes the enumeration facet "R60080-X00162,
> R60080-X00163, ..." of the type
> Q{http://www.millicom.com}CommercialServiceCodeType
> The content "TBD" of
> element MovementTechnology does not match the required simple type.
> Value "TBD" contravenes the enumeration facet "N/A, HFC, GPON, MMDS, FIBER,
> C..." of the type Q{http://www.millicom.com}MovementTechnologyType
> The content "-1." of
> element DownloadSpeed does not match the required simple type. Value
> "-1" contravenes the minExclusive facet "0" of the type
> Q{http://www.millicom.com}DownloadSpeedType
> The 7th field in constraint
> {PK} has no value
> ...
> 
>
>
>
> This way the output of the validation is much more readable and hopefully
> will fit in its entirety to the output pane.
>
>
>
> Regards,
>
> Yitzhak Khabinsky
>
> Technical Services Lead
>
> Millicom International Services LLC
>
> 396 Alhambra Circle, Suite 1100
>
> Coral Gables, FL  33134
>
> Skype4B: +1 (305) 445-4172
>
> Tel: (954) 684-8673
>
> yitzhak.khabin...@millicom.com
>
> www.millicom.com
>
>


Re: [basex-talk] Validation Module: validate:xsd-report( ) improvement

2018-04-19 Thread Yitzhak Khabinsky
Hi Marco,

The proposed solution is way too slow.


let $xml := 'd:\Temp\CDW\HOME\id4879_BO201801_HomeSubscriberMovementFact.xml'

let $xsd := 'd:\Temp\CDW\HOME\HomeSubscriberMovementFact.xsd'



let $validate := validate:xsd-report($xml, $xsd, '1.1')



return file:write("output.xml",

  copy $newvalidate := $validate

  modify (delete node $newvalidate//@url)

  return $newvalidate

)

I guess delete node... is too heavy.
It runs for 143 seconds.
Without it just 11 seconds.

The input XML file size is about 40MB.
The output.xml file has about the same size.

That's why I was proposing to change the default output format of the 
validation.

Regards,
Yitzhak Khabinsky
Technical Services Lead
Millicom International Services LLC
396 Alhambra Circle, Suite 1100
Coral Gables, FL  33134
Skype4B: +1 (305) 445-4172
Tel: (954) 684-8673
yitzhak.khabin...@millicom.com<mailto:.khabin...@millicom.com>
www.millicom.com<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.millicom.com%2F=02%7C01%7CMalcolm.Stewart%40microsoft.com%7C9338023699c2494d08be08d4ad12ce55%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636323743686702678=DxQ9dfL259rpdVZ9seOAcR6bvgxRobyIerOgvTgPc90%3D=0>

From: Yitzhak Khabinsky
Sent: Thursday, April 19, 2018 2:51 PM
To: m.lett...@gmail.com
Subject: Re: [basex-talk] Validation Module: validate:xsd-report( ) improvement

Hi Marco,

Thanks for the proposed solution.
It works.

But I was referring to the default behavior.
The url attribute is redundant for every message element.


Regards,
Yitzhak Khabinsky
Technical Services Lead
Millicom International Services LLC
396 Alhambra Circle, Suite 1100
Coral Gables, FL  33134
Skype4B: +1 (305) 445-4172
Tel: (954) 684-8673
yitzhak.khabin...@millicom.com<mailto:.khabin...@millicom.com>
www.millicom.com<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.millicom.com%2F=02%7C01%7CMalcolm.Stewart%40microsoft.com%7C9338023699c2494d08be08d4ad12ce55%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636323743686702678=DxQ9dfL259rpdVZ9seOAcR6bvgxRobyIerOgvTgPc90%3D=0>



Re: [basex-talk] Validation Module: validate:xsd-report( ) improvement

2018-04-19 Thread Marco Lettere

Hi Yitzhak,


maybe, by slightly rewriting your code,  you could remove the unwnated 
attribute and serialize your output to a file and view it from a text 
editor?



let $xml := 
'd:\Temp\CDW\HOME\id4879_BO201801_HomeSubscriberMovementFact.xml'


let $xsd := 'd:\Temp\CDW\HOME\HomeSubscriberMovementFact.xsd'

let $validate := validate:xsd-report($xml, $xsd, '1.1')

return file:write("output.xml",
  copy $newvalidate := $validate
  modify (delete node $newvalidate//@url)
  return $newvalidate
)


Regards,

Marco.


return validate:xsd-report($xml, $xsd, '1.1')

On 19/04/2018 19:20, Yitzhak Khabinsky wrote:


Hello,

I am successfully using BaseX Validation Module.

Along the following lines:

let $xml := 
'd:\Temp\CDW\HOME\id4879_BO201801_HomeSubscriberMovementFact.xml'


let $xsd := 'd:\Temp\CDW\HOME\HomeSubscriberMovementFact.xsd'

return validate:xsd-report($xml, $xsd, '1.1')

My XML files have multi-megabyte size and lots of validation errors. 
In tens or hundreds of thousands of errors.


Behind the scenes, Saxon validator 9.8.0.11 is running.

Unfortunately, the output structure contains a repeating *url* attribute.

The BaseX output pane cannot present all the errors.

It says: “(Chopped) Results”.


invalid
level="Error"line="10"column="26"url="file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFact.xml">The 
content "N/A" of element CommercialServiceCode does not match 
the required simple type. Value "N/A" contravenes the enumeration 
facet "R60080-X00162, R60080-X00163, ..." of the type 
Q{http://www.millicom.com}CommercialServiceCodeType
level="Error"line="19"column="23"url="file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFact.xml">The 
content "TBD" of element MovementTechnology does not match the 
required simple type. Value "TBD" contravenes the enumeration facet 
"N/A, HFC, GPON, MMDS, FIBER, C..." of the type 
Q{http://www.millicom.com}MovementTechnologyType
level="Error"line="24"column="18"url="file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFact.xml">The 
content "-1." of element DownloadSpeed does not match the 
required simple type. Value "-1" contravenes the minExclusive facet 
"0" of the type Q{http://www.millicom.com}DownloadSpeedType
level="Error"line="26"column="6"url="file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFact.xml">The 
7th field in constraint {PK} has no value

...


My proposal is to eliminate the repeated *url* attribute from the each 
*message* and elevate it to its own element just once under the root 
*report* tag.


Along the following output structure:



invalid
file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFact.xml
The content "N/A" of 
element CommercialServiceCode does not match the required 
simple type. Value "N/A" contravenes the enumeration facet 
"R60080-X00162, R60080-X00163, ..." of the type 
Q{http://www.millicom.com}CommercialServiceCodeType
The content "TBD" of 
element MovementTechnology does not match the required simple 
type. Value "TBD" contravenes the enumeration facet "N/A, HFC, GPON, 
MMDS, FIBER, C..." of the type 
Q{http://www.millicom.com}MovementTechnologyType
The content "-1." of 
element DownloadSpeed does not match the required simple type. 
Value "-1" contravenes the minExclusive facet "0" of the type 
Q{http://www.millicom.com}DownloadSpeedType
The 7th field in constraint 
{PK} has no value

...


This way the output of the validation is much more readable and 
hopefully will fit in its entirety to the output pane.


Regards,

Yitzhak Khabinsky

Technical Services Lead

Millicom International Services LLC

396 Alhambra Circle, Suite 1100

Coral Gables, FL  33134

Skype4B: +1 (305) 445-4172

Tel: (954) 684-8673

_yitzhak.khabin...@millicom.com _

www.millicom.com 







[basex-talk] Validation Module: validate:xsd-report( ) improvement

2018-04-19 Thread Yitzhak Khabinsky
Hello,

I am successfully using BaseX Validation Module.
Along the following lines:
let $xml := 'd:\Temp\CDW\HOME\id4879_BO201801_HomeSubscriberMovementFact.xml'
let $xsd := 'd:\Temp\CDW\HOME\HomeSubscriberMovementFact.xsd'
return validate:xsd-report($xml, $xsd, '1.1')

My XML files have multi-megabyte size and lots of validation errors. In tens or 
hundreds of thousands of errors.
Behind the scenes, Saxon validator 9.8.0.11 is running.

Unfortunately, the output structure contains a repeating url attribute.
The BaseX output pane cannot present all the errors.
It says: "(Chopped) Results".


invalid
The
 content "N/A" of element CommercialServiceCode does not match the 
required simple type. Value "N/A" contravenes the enumeration facet 
"R60080-X00162, R60080-X00163, ..." of the type 
Q{http://www.millicom.com}CommercialServiceCodeType
The
 content "TBD" of element MovementTechnology does not match the 
required simple type. Value "TBD" contravenes the enumeration facet "N/A, HFC, 
GPON, MMDS, FIBER, C..." of the type 
Q{http://www.millicom.com}MovementTechnologyType
The
 content "-1." of element DownloadSpeed does not match the required 
simple type. Value "-1" contravenes the minExclusive facet "0" of the type 
Q{http://www.millicom.com}DownloadSpeedType
The
 7th field in constraint {PK} has no value
...


My proposal is to eliminate the repeated url attribute from the each message 
and elevate it to its own element just once under the root report tag.
Along the following output structure:


invalid

file:///D:/Temp/CDW/HOME/id4879_BO201801_HomeSubscriberMovementFact.xml
The content "N/A" of element 
CommercialServiceCode does not match the required simple type. Value 
"N/A" contravenes the enumeration facet "R60080-X00162, R60080-X00163, ..." of 
the type Q{http://www.millicom.com}CommercialServiceCodeType
The content "TBD" of element 
MovementTechnology does not match the required simple type. Value "TBD" 
contravenes the enumeration facet "N/A, HFC, GPON, MMDS, FIBER, C..." of the 
type Q{http://www.millicom.com}MovementTechnologyType
The content "-1." of 
element DownloadSpeed does not match the required simple type. Value 
"-1" contravenes the minExclusive facet "0" of the type 
Q{http://www.millicom.com}DownloadSpeedType
The 7th field in constraint 
{PK} has no value
...


This way the output of the validation is much more readable and hopefully will 
fit in its entirety to the output pane.

Regards,
Yitzhak Khabinsky
Technical Services Lead
Millicom International Services LLC
396 Alhambra Circle, Suite 1100
Coral Gables, FL  33134
Skype4B: +1 (305) 445-4172
Tel: (954) 684-8673
yitzhak.khabin...@millicom.com
www.millicom.com