Re: [basex-talk] Huge No of XML files.
On Wed, 2019-12-18 at 11:10 +0530, Sreenivasulu Yadavalli wrote: > > > What exactly do you mean by moving collections around?. > > A: moving the collections in the same system. So, you use the Linux "mv" command to do this? Or what? What exactly do you mean by collections? I for one would find it easier if you would stop talking in riddles, as my telepathy skills are weak. > And every day we have to > update the existing collection with call data. So finding the > collection is > taking more time How do you look for the collection? Isn't it a separate BaseX database? > > Are you taking a database with 100 million documents and renaming > 50,000 of them? > > What operations exactly are slow? > > A: finding the existing collection. find / -name collection.db ? This is a little frustrating in that you are asking for people's help but not explaining the problem. Are you saying that fn:collection() is slow in BaseX? What arguments are you passing it exactly? What is the size, in gigabytes, of the database, on disk? How many documents are in it? Can you give step-by-step EXACT AND PRECISSE instructions so someone else could reproduce the problem you have having? Complete and exact instructions, with sample files if needed, so they can reproduce the problem on their own computer? A database with 80,000 files is easy to "find" here, and opens quickly, in a small fraction of a second. It doesn't take hours. Is something else running on your computer that makes it slow?? Note: please remember to copy the list in your replies, as the BaseX people are far more knowledgeable about BaseX than i am :) My goal as an analyst is to get you to explain the problem you are having clearly enough that you can get an answer :) Liam -- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org
Re: [basex-talk] Huge No of XML files.
On Tue, 2019-12-17 at 11:48 +0530, Sreenivasulu Yadavalli wrote: > > Every day we are moving collections around 55k to 60k no of xml files > large > account. Here, i just created a BaseX database with 80,000 XML files. It took under one minute on the Linux desktop system i use. > Its taking more than 18 hours. This make no sense. How much memory do you have on the computer? What exactly do you mean by moving collections around? Are you taking a database with 100 million documents and renaming 50,000 of them? What operations exactly are slow? Liam -- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org
Re: [basex-talk] Huge No of XML files.
> A: Every day we are generating collections (Unbilled Mobile Transactionn > data) Per account appr 50k - 55k files. So you are creating a new collection every day, and it will have around 60k documents at the end of the day? Will you discard older collections? And I assume you need to incrementally add the documents, or do you have all of them at hand before you run the first queries? In the latter case, you might need to partition your database and work with multiple instances (see [1,2] for details). In both cases, it might be recommendable to experience with the UPDINDEX option [3]. > 4. How do your queries for reporting data look like? Feel invited to give us more information on the database that you accessing in your query. [1] http://docs.basex.org/wiki/Databases [2] http://docs.basex.org/wiki/Statistics [3] http://docs.basex.org/wiki/Options#UPDINDEX
Re: [basex-talk] Huge No of XML files.
Hi Christian, 1. What is the average size of each of your document? A: 5k to 7k 2. Will this sum up to a yearly amount of appr. 20 million files, or do you have an upper limit? A: Every day we are generating collections (Unbilled Mobile Transactionn data) Per account appr 50k - 55k files. 3. Which of our API(s) are you using to add the documents? A: import org.basex.server.ClientSession; 4. How do your queries for reporting data look like? A: Usage Summary Per Line:10 minutes declare function local:getLabelName($serNo as xs:string) { let $doc := collection('LabelCollection_1573624392934') let $label := $doc//row[@ser_no = $serNo] let $lbCount:=count($doc//row[@ser_no = $serNo]) return if(data($label) != '' and $lbCount<=1) then data($label) else ('') }; declare function local:getRecords() { let $Rows := (collection("867509_Voice_OCTOBER-19_Unbilled_1")/SUBCUSTBRK[AccNo[@NO = (6945045)]][DOB >='2019-10-01' and DOB <='2019-10-31']) let $billNo := data($Rows/CONN/@NO) let $topRecords := $billNo for $details in $Rows[CONN[data(@NO)=$topRecords]]/Country/ROW group by $billNo := data($details/../../CONN/@NO) return { element Acno {distinct-values(data($details/../../AccNo/@NO))}, element BNO {$billNo}, element SLBl {distinct-values(local:getLabelName($billNo))}, element Stat {}, for $serviceDetails in $details group by $groupService := data($serviceDetails/@ServType) return (element {concat($groupService,'C') } {xs:decimal(sum(data($serviceDetails/@Calls))) } ), element TCls {xs:decimal(sum(data($details/@Calls)))} } }; ( let $records := local:getRecords()[position() le 1] let $totalCalls := sum($records/TCls) let $pstncls := xs:decimal(sum($records/PSTNC)) let $vpncls := xs:decimal(sum($records/VPNC)) let $tnbscls := xs:decimal(sum($records/TNBSC)) let $fmccls := xs:decimal(sum($records/FMCC)) return ($records, if(count($records) > 0) then ( { { 'Total' }, , , , { xs:decimal($pstncls) }, { xs:decimal($vpncls) }, { xs:decimal($tnbscls) }, { xs:decimal($fmccls) }, { xs:decimal($totalCalls) } })else ()) ) Trend Report by Call type:4 minutes declare function local:getDurationPer($ctDur as xs:double, $ctTotalDur as xs:double) {if($ctTotalDur=0)then()else( let $ctPer := concat(xs:decimal(round-half-to-even((($ctDur div $ctTotalDur) * 100), 3)), "%") return $ctPer) }; declare function local:getHHMMSSFromNumber($num as xs:double) { let $ss := xs:integer($num mod 60) let $mm1 := xs:integer($num div 60) let $mm := xs:integer($mm1 mod 60) let $hh := xs:integer($mm1 div 60) let $hhFinal := if(string-length(xs:string($hh)) >= 2) then $hh else (concat('0', $hh)) let $mmFinal := if(string-length(xs:string($mm)) >= 2) then $mm else (concat('0', $mm)) let $ssFinal := if(string-length(xs:string($ss)) >= 2) then $ss else (concat('0', $ss)) return concat($hhFinal,':',$mmFinal,':',$ssFinal) }; declare function local:getMMSSFromNumber($num as xs:double) { let $ss := xs:integer($num mod 60) let $mm := xs:integer($num div 60) let $mmFinal := if(string-length(xs:string($mm)) >= 2) then $mm else (concat('0', $mm)) let $ssFinal := if(string-length(xs:string($ss)) >= 2) then $ss else (concat('0', $ss)) return concat($mmFinal,':',$ssFinal) }; let $subbrk := (collection("867509_Voice_OCTOBER-19_Unbilled_1")/SUBCUSTBRK[AccNo[@NO = (6945045)]][DOB >= '2019-10-01' and DOB <= '2019-10-31']) let $detailUsageTxn := $subbrk/DETAIL/TRANSACTION[@Usage = 'usage'] let $DistCallTypes := distinct-values($detailUsageTxn/SUB_SECTION/@Type) let $allmonths := distinct-values($subbrk/Month) let $rowcnt := count($detailUsageTxn/SUB_SECTION) let $months := $allmonths[position() le 1] let $opttype := 'NumVal' let $durop := 'hh:mm:ss' return ( if($DistCallTypes!='')then(for $month in $months return { $month }{ if($opttype = 'NumVal') then for $ct in $DistCallTypes let $ctCnt := sum($detailUsageTxn/SUB_SECTION[data(@Type) = data($ct) and data(../../../Month) = $month]/@Calls) return element { replace(concat('_',data($ct)),'(\.|\[|\]|\\|\||\-|\^|\$|\?|\*|\+|\{|\}|\(|\)| |')', '_') } {xs:decimal($ctCnt)} else if($opttype = 'NumPerViewPoint') then for $ct in $DistCallTypes let $ctCnt := sum($detailUsageTxn/SUB_SECTION[data(@Type) = data($ct) and data(../../../Month) = $month]/@Calls) let $ctTotalCnt := sum($detailUsageTxn/SUB_SECTION[data(@Type) = data($ct)]/@Calls) let $ctPer := concat(xs:decimal(round-half-to-even((($ctCnt div $ctTotalCnt) * 100), 3)), '%') return element { replace(concat('_',data($ct)),'(\.|\[|\]|\\|\||\-|\^|\$|\?|\*|\+|\{|\}|\(|\)| |')', '_') } {$ctPer} else if($opttype = 'NumPerCallType') then for $ct in $Dist
Re: [basex-talk] Huge No of XML files.
> Pls help me. Trying. Could you try to help us first und answer all of the questions of my initial reply?
Re: [basex-talk] Huge No of XML files.
Hi Christian, Every day we are generating upto 50k xml files per account. At the time of collection generation i need to generate report on the selected account. Then immediately that collection got terminated. Not able to generate report. Even collection creation time is also taken more than 15 hours that particular account. Pls help me. Regards, YSL On Tue, Dec 17, 2019 at 12:27 PM Christian Grün wrote: > Hi YSL, > > > Every day we are moving collections around 55k to 60k no of xml files > large account. Its taking more than 18 hours. At that time we want access > the collection for generating report its on lock mode and scrip that > collection. > > Some questions back: > > 1. What is the average size of each of your document? > 2. Will this sum up to a yearly amount of appr. 20 million files, or > do you have an upper limit? > 3. Which of our API(s) are you using to add the documents? > 4. How do your queries for reporting data look like? > > > Please help and do needful. > > https://ell.stackexchange.com/a/17626 ;) > > Best, > Christian >
Re: [basex-talk] Huge No of XML files.
Hi YSL, > Every day we are moving collections around 55k to 60k no of xml files large > account. Its taking more than 18 hours. At that time we want access the > collection for generating report its on lock mode and scrip that collection. Some questions back: 1. What is the average size of each of your document? 2. Will this sum up to a yearly amount of appr. 20 million files, or do you have an upper limit? 3. Which of our API(s) are you using to add the documents? 4. How do your queries for reporting data look like? > Please help and do needful. https://ell.stackexchange.com/a/17626 ;) Best, Christian
[basex-talk] Huge No of XML files.
Hi Team, We have a big problem for creating the collection bcoz of large no of xmls. Every day we are moving collections around 55k to 60k no of xml files large account. Its taking more than 18 hours. At that time we want access the collection for generating report its on lock mode and scrip that collection. Please help and do needful. Regards, YSL