Re: [basex-talk] Out Of Memory
Hi Mansi, From what I can see, for each pqr value, you could use db:attribute-range to retrieve all the file names, group by/count to obtain statistics. You could also create a new collection from an extraction of only the data you need, changing @name into element and use full text fuzzy match. Hoping it helps Cordialement Fabrice De : basex-talk-boun...@mailman.uni-konstanz.de [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Mansi Sheth Envoyé : jeudi 6 novembre 2014 20:55 À : Christian Grün Cc : BaseX Objet : Re: [basex-talk] Out Of Memory I would be doing tons of post processing. I never use UI. I either use REST thru cURL or command line. I would basically need data in below format: XML File Name, @name I am trying to whitelist picking up values for only starts-with(@name,pqr). where pqr is a list of 150 odd values. My file names, are essentially some ID/keys, which I would need to map it further using sqlite to some values and may be group by it.. etc. So, basically I am trying to visualize some data, based on its existence in which xml files. So, yes count(query) would be fine, but won't solve much purpose, since I still need value pqr. - Mansi On Thu, Nov 6, 2014 at 11:19 AM, Christian Grün christian.gr...@gmail.commailto:christian.gr...@gmail.com wrote: Query: /A/*//E/@name/string() In the GUI, all results will be cached, so you could think about switching to command line. Do you really need to output all results, or do you do some further processing with the intermediate results? For example, the query count(/A/*//E/@name/string()) will probably run without getting stuck. This query, was going OOM, within few mins. I tried a few ways, of whitelisting, with contain clause, to truncate the result set. That didn't help too. So, now I am out of ideas. This is giving JVM 10GB of dedicated memory. Once, above query works and doesn't go Out Of Memory, I also need corresponding file names too: XYZ.xml //E/@name PQR.xml //E/@name Let me know if you would need more details, to appreciate the issue ? - Mansi On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün christian.gr...@gmail.commailto:christian.gr...@gmail.com wrote: Hi Mansi, I think we need more information on the queries that are causing the problems. Best, Christian On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.commailto:mansi.sh...@gmail.com wrote: Hello, I have a use case, where I have to extract lots in information from each XML in each DB. Something like, attribute values of most of the nodes in an XML. For such, queries based goes Out Of Memory with below exception. I am giving it ~12GB of RAM on i7 processor. Well I can't complain here since I am most definitely asking for loads of data, but is there any way I can get these kinds of data successfully ? mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp BaseX 8.0 beta b45c1e2 [Server] Server was started (port: 1984) HTTP Server was started (port: 8984) Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073) at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342) at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:744) -- - Mansi -- - Mansi -- - Mansi
Re: [basex-talk] Out Of Memory
Hi Mansi, Once, above query works and doesn't go Out Of Memory, I also need corresponding file names too: Sorry, I skipped this one. Here is one way to do it: declare option output:item-separator #xa;; for $db in db:open('') let $path := db:path($db) for $name in $db//E/@name return $path || out:tab() || $name I was surprised to hear that you are getting OOM errors on command-line, because the query you mentioned should then be evaluated in a streaming fashion (i. e., it should require very low and constant memory). Could you try the above query? If it fails, could you possibly send me the query plan? On command line, it can be retrieved via the -x flag. I just remember that you have been using xquery:eval, right? My guess it that it occurs in combination with this function, because it may require all results to be cached before they are being sent back to the client. Do you think you can alternatively put your queries into files, or do you need more flexibility? Christian On Thu, Nov 6, 2014 at 8:58 PM, Mansi Sheth mansi.sh...@gmail.com wrote: Briefly explaining, trying to extract these values/per xml file (where .xml files are ID), to map it to its corresponding values. Imagine, you have 100s of customers, and each customer uses/needs 1000s of different @name. These @name would be similar across customer, but few would be using some values, few customer some other. Trying to collect all this information and find, which @name is used by most customer and so on and so forth. There are few such use cases, this one being most generic. On Thu, Nov 6, 2014 at 11:23 AM, Fabrice Etanchaud fetanch...@questel.com wrote: The solution depends on the usage you will have of your extraction. May I ask you what is your extraction for ? Best regards, Fabrice De : Mansi Sheth [mailto:mansi.sh...@gmail.com] Envoyé : jeudi 6 novembre 2014 17:11 À : Fabrice Etanchaud Cc : Christian Grün; BaseX Objet : Re: [basex-talk] Out Of Memory Interesting idea, I thought of using db partition, but didn't pursue it further, mainly due to below thought process. Currently, I am ingesting ~3000 xml files, storing ~50 xml files per db, which would be growing quickly. So, below approach would lead to ~3000 more files (which would be increasing), increasing I/O operations considerably for further pre-processing. However, I don't really care if process takes few minutes to few hours (as long as its not day(s) ;)). Given the situation and my options, I would surely try this. Database, is currently indexed at attribute level, as thats what I would be querying the most. Do you think, I should do anything differently ? Thanks, - Mansi On Thu, Nov 6, 2014 at 10:48 AM, Fabrice Etanchaud fetanch...@questel.com wrote: Hi Mansi, Here you have a natural partition of your data : the files you ingested. So my first suggestion would be to query your data on a file basis: for $doc in db:open(‘your_collection_name’) let $file-name := db:path($doc) return file:write( $file-name, names { for $name in $doc//E/@name/data() return name{$name}/name } /names ) Is it for indexing ? Hope it helps, Best regards, Fabrice Etanchaud Questel/Orbit De : basex-talk-boun...@mailman.uni-konstanz.de [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Mansi Sheth Envoyé : jeudi 6 novembre 2014 16:33 À : Christian Grün Cc : BaseX Objet : Re: [basex-talk] Out Of Memory This would need a lot of details, so bear with me below: Briefly my XML files look like: A name= B name= C name= D name= E name=/ A can contain B, C or D and B, C or D can contain E. We have 1000s (currently 3000 in my test data set) of such xml files, of size 50MB on an average. Its tons of data ! Currently, my database is of ~18GB in size. Query: /A/*//E/@name/string() This query, was going OOM, within few mins. I tried a few ways, of whitelisting, with contain clause, to truncate the result set. That didn't help too. So, now I am out of ideas. This is giving JVM 10GB of dedicated memory. Once, above query works and doesn't go Out Of Memory, I also need corresponding file names too: XYZ.xml //E/@name PQR.xml //E/@name Let me know if you would need more details, to appreciate the issue ? - Mansi On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, I think we need more information on the queries that are causing the problems. Best, Christian On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.com wrote: Hello, I have a use case, where I have to extract lots in information from each XML in each DB. Something like, attribute values of most of the
Re: [basex-talk] Out Of Memory
do you need more flexibility? To partially answer my own question, it might be interesting for you to hear that you have various ways of specifying queries via REST [1]: * You can store your query server-side and use the ?run=... argument to evaluate this query file * You can send a POST request, which contains the query to be evaluated. In both cases, intermediate results won't be cached, but directly streamed back to the client. Hope this helps, Christian [1] http://docs.basex.org/wiki/REST On Fri, Nov 7, 2014 at 10:48 AM, Christian Grün christian.gr...@gmail.com wrote: declare option output:item-separator #xa;; for $db in db:open('') let $path := db:path($db) for $name in $db//E/@name return $path || out:tab() || $name
Re: [basex-talk] Tomcat - Multiple BaseX Services - Startup Problems
Hi Bridger, I would be interested to know if you have set HTTPLOCAL to true or false? In the latter case, the database server will be started as well (which allows you to use the client bindings, basexclient, etc.). In that case, you may also need to change the EVENTPORT [2]. As you see on the referenced Wiki page, there are some more ports (like STOPPORT), which you could try to change in each web.xml file. Please tell me if this brings you any further. If yes, we should surely add some more information in our Wiki. Thanks, Christian [1] http://docs.basex.org/wiki/Web_Application#Configuration [2] http://docs.basex.org/wiki/Options#EVENTPORT On Fri, Nov 7, 2014 at 4:23 AM, Bridger Dyson-Smith bdysonsm...@gmail.com wrote: Hi all, On Tue, Nov 4, 2014 at 10:31 AM, Bridger Dyson-Smith bdysonsm...@gmail.com wrote: My apologies -- I accidentally clicked 'send'. :/ On Tue, Nov 4, 2014 at 10:12 AM, Bridger Dyson-Smith bdysonsm...@gmail.com wrote: Hi all, I know that there aren't many Tomcat users on this list, and that I'm echoing previous emails to the list but I wanted to see if anyone here had encountered this issue. Some pieces of this may be steps forward to solving problems that others on the list have mentioned; I've cc'd you in the hopes that the information is helpful. Tomcat can be configured to multiple services, on varying ports, in the $TOMCAT_HOME/conf/server.xml file; e.g. [1]. I have two nearly-identical WARs in the following directory structure: `-- tree -L 1 apache-tomcat-7.0.53 apache-tomcat-7.0.53 ├── LICENSE ├── NOTICE ├── RELEASE-NOTES ├── RUNNING.txt ├── bin ├── conf ├── lib ├── logs ├── logs2 ├── temp ├── webapps ├── webapps2 └── work `-- tree -L 1 webapps webapps ├── BaseX79 ├── BaseX79.war ├── ROOT ├── docs ├── examples ├── fop ├── fop.war ├── host-manager ├── imagemanip ├── imagemanip.war ├── lukeall-1.0.1.jar ├── manager ├── retailer ├── retailer.war ├── saxon ├── saxon.war ├── spc ├── spc.war ├── static ├── utk-xtf ├── utk-xtf-frameless ├── utk-xtf-frameless.war └── utk-xtf.war `-- tree -L 1 webapps2 webapps2 ├── ROOT ├── bX79 ├── bX79.war ├── docs ├── examples ├── host-manager └── manager Now, the problem is that I get a port conflict message from Tomcat [2] and only the first WAR ($TOMCAT_HOME/webapps/BaseX79) loads. E.g. there are some minor textual changes between the two restxq.xqm files. I've tried to add a .basex file (webapps2/bX79/.basex) that specifies a different port, and I've also tried adding that information in webapps2/bX79/WEB-INF/web.xml as context-params [3]. I'm planning to email the Tomcat-users list to see if someone there can shed more light on this; i.e. is this a problem with the way that server.xml is configured, etc. However I was also curious if anyone here had any thoughts or suggestions on this setup. Am I missing a step; e.g. should I be incorporating a compilation step - generating a WAR file - or something else? I apologize for the breadth of the questions - I've jumped into the middle of a problem and now I'm trying to work my back out to the start. Thank you for your time trouble. Best, Bridger [1] example server.xml: Server port=8005 shutdown=SHUTDOWN Listener className=org.apache.catalina.core.AprLifecycleListener SSLEngine=on/ Listener className=org.apache.catalina.core.JasperListener/ Listener className=org.apache.catalina.core.JreMemoryLeakPreventionListener/ Listener className=org.apache.catalina.mbeans.GlobalResourcesLifecycleListener/ Listener className=org.apache.catalina.core.ThreadLocalLeakPreventionListener/ GlobalNamingResources Resource name=UserDatabase auth=Container type=org.apache.catalina.UserDatabase description=User database that can be updated and saved factory=org.apache.catalina.users.MemoryUserDatabaseFactory pathname=conf/tomcat-users.xml/ /GlobalNamingResources Service name=Catalina Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 maxThreads=125 minSpareThreads=25 maxSpareThreads=75 enableLookups=false acceptCount=100/ Connector port=8009 protocol=AJP/1.3 redirectPort=8443/ Engine name=Catalina defaultHost=localhost Realm className=org.apache.catalina.realm.LockOutRealm Realm className=org.apache.catalina.realm.UserDatabaseRealm resourceName=UserDatabase/ /Realm Host name=localhost appBase=webapps unpackWARs=true autoDeploy=true Valve className=org.apache.catalina.valves.AccessLogValve directory=logs prefix=localhost_access_log. suffix=.txt
Re: [basex-talk] Feature Request: serializable functions
Hi Andy, Do you mean they invoke internal BaseX functions? Yes, exactly. One example are range checks. The following query… x123/x[text() 1 and text() 400] …will result in the following optimized query string: element x { (123) }[1.0 text() 400.0] The query plan looks as follows: CmpR min=1.0 max=400.0 CachedPath IterStep axis=child test=text()/ /CachedPath /CmpR If these serializations are post optimization does mean they MAY embody assumptions about the context that are only valid at the moment they were made? They will absolutely do so. The following query… map:serialize(map{ 'x': try { db:open('i-do-not-exist') } catch * { '-' } }) …result in { x: - } (if i-do-not-exist does not exist). If we decide to introduce a function like inspect:serialize, we should by all means stress that the output is no equivalent representation of the input function, and point out that it will be the compiled query result in the given context. As the inspect:function would always be evaluated at runtime, we don't have a representation of the original query anymore (it's simply not required anywhere else), so this is actually the only thing we can currently offer at this stage. Thanks for asking, Christian
[basex-talk] unexpected behaviour
Hi, Running the attached program produces the next evaluation. Evaluating: - elementSeq: A ref=var1BX1//B/A - content0: X1/ - content2: X1/ - elementSeq: A ref=var1BX2//B/A - content0: X2/ - content2: X1/ I can't explain the value of the last - content2: -trace. I expect it to be the same value as the last - content0: -trace. What's happening here? - Rob Stapper --- Dit e-mailbericht bevat geen virussen en malware omdat avast! Antivirus-bescherming actief is. http://www.avast.com elementmerge_test.xq Description: Binary data
Re: [basex-talk] unexpected behaviour
Hi Rob, thanks for the example code. Could you possibly reduce it to a shorter snippet that still demonstrates the surprising behavior? Best, Christian On Fri, Nov 7, 2014 at 6:00 PM, Rob Stapper r.stap...@lijbrandt.nl wrote: Hi, Running the attached program produces the next evaluation. Evaluating: - elementSeq: A ref=var1BX1//B/A - content0: X1/ - content2: X1/ - elementSeq: A ref=var1BX2//B/A - content0: X2/ - content2: X1/ I can’t explain the value of the last “- content2: ”-trace. I expect it to be the same value as the last “- content0: ”-trace. What’s happening here? - Rob Stapper -- http://www.avast.com/ Dit e-mailbericht bevat geen virussen en malware omdat avast! Antivirus http://www.avast.com/ actief is.
Re: [basex-talk] Out Of Memory
This email chain, is extremely helpful. Thanks a ton guys. Certainly one of the most helpful folks here :) I have to try a lot of these suggestions but currently I am being pulled into something else, so I have to pause for the time being. Will get back to this email thread, after trying a few things and my relevant observations. - Mansi On Fri, Nov 7, 2014 at 3:48 AM, Fabrice Etanchaud fetanch...@questel.com wrote: Hi Mansi, From what I can see, for each pqr value, you could use db:attribute-range to retrieve all the file names, group by/count to obtain statistics. You could also create a new collection from an extraction of only the data you need, changing @name into element and use full text fuzzy match. Hoping it helps Cordialement Fabrice *De :* basex-talk-boun...@mailman.uni-konstanz.de [mailto: basex-talk-boun...@mailman.uni-konstanz.de] *De la part de* Mansi Sheth *Envoyé :* jeudi 6 novembre 2014 20:55 *À :* Christian Grün *Cc :* BaseX *Objet :* Re: [basex-talk] Out Of Memory I would be doing tons of post processing. I never use UI. I either use REST thru cURL or command line. I would basically need data in below format: XML File Name, @name I am trying to whitelist picking up values for only starts-with(@name,pqr). where pqr is a list of 150 odd values. My file names, are essentially some ID/keys, which I would need to map it further using sqlite to some values and may be group by it.. etc. So, basically I am trying to visualize some data, based on its existence in which xml files. So, yes count(query) would be fine, but won't solve much purpose, since I still need value pqr. - Mansi On Thu, Nov 6, 2014 at 11:19 AM, Christian Grün christian.gr...@gmail.com wrote: Query: /A/*//E/@name/string() In the GUI, all results will be cached, so you could think about switching to command line. Do you really need to output all results, or do you do some further processing with the intermediate results? For example, the query count(/A/*//E/@name/string()) will probably run without getting stuck. This query, was going OOM, within few mins. I tried a few ways, of whitelisting, with contain clause, to truncate the result set. That didn't help too. So, now I am out of ideas. This is giving JVM 10GB of dedicated memory. Once, above query works and doesn't go Out Of Memory, I also need corresponding file names too: XYZ.xml //E/@name PQR.xml //E/@name Let me know if you would need more details, to appreciate the issue ? - Mansi On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, I think we need more information on the queries that are causing the problems. Best, Christian On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.com wrote: Hello, I have a use case, where I have to extract lots in information from each XML in each DB. Something like, attribute values of most of the nodes in an XML. For such, queries based goes Out Of Memory with below exception. I am giving it ~12GB of RAM on i7 processor. Well I can't complain here since I am most definitely asking for loads of data, but is there any way I can get these kinds of data successfully ? mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp BaseX 8.0 beta b45c1e2 [Server] Server was started (port: 1984) HTTP Server was started (port: 8984) Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073) at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342) at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:744) -- - Mansi -- - Mansi -- - Mansi -- - Mansi
Re: [basex-talk] Dynamic Evaluation of XQUERY
Christian, I am running out of ideas in debugging this. When I directly execute this query within XQUERY file, its working perfectly. Just when I pass it thru command line, its breaking. Infact the actual .xq file also doesn't matter, as you pointed out, parsing from command line is broken. I tried -d switch and escaping spaces, but didn't help. Also, I tested, this is a valid XPATH query. Please pardon my XQUERY knowledge, its really not my background. - Mansi On Thu, Nov 6, 2014 at 8:45 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, ~/Downloads/basex/bin/basex -bn='/Archives/*//class[contains(@name,abc) and contains(@name,pqr)]' get_paths.xq Stopped at /Users/mansiadmin/Documents/Research-Projects/BigData, 1/4: [XPDY0002] and: no context value bound. It seems that and was interpreted as XPath step, so it seems as if something went wrong when parsing your query on command line (I doubt that it's something specific to BaseX). Maybe you can simply try to output the query that causes the error, instead of trying to evaluate it? Christian However, below query works as a charm: ~/Downloads/basex/bin/basex -bn='/Archives/*//class[contains(@name,abc)]' get_paths.xq I am hoping, for first query above, its some syntactic issue at my end. But, couldn't fix it, so thought should point out. Please advise. Code: declare variable $n as xs:string external; declare option output:item-separator #xa;; let $aPath := for $db in db:list() let $query := declare variable $db external; || db:open($db) || $n return xquery:eval($query, map { 'db': $db, 'query': $n }) let $paths := for $elem in $aPath return db:path($elem) return distinct-values($paths) On Mon, Nov 3, 2014 at 6:48 PM, Christian Grün christian.gr...@gmail.com wrote: …in the meanwhile, could you please check if the bug has possibly been fixed in the latest 8.0 snapshot [1]? [1] http://files.basex.org/releases/latest On Tue, Nov 4, 2014 at 12:46 AM, Christian Grün christian.gr...@gmail.com wrote: Improper use? Potential bug? Your feedback is welcome: Sounds like a little bug indeed; I will check it tomorrow! Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 7.9 Java: Oracle Corporation, 1.7.0_45 OS: Mac OS X, x86_64 Stack Trace: java.lang.NullPointerException at org.basex.query.value.item.Str.get(Str.java:49) at org.basex.query.func.FNDb.path(FNDb.java:489) at org.basex.query.func.FNDb.item(FNDb.java:128) at org.basex.query.expr.ParseExpr.iter(ParseExpr.java:45) at org.basex.query.func.FNDb.iter(FNDb.java:92) at org.basex.query.gflwor.GFLWOR$2.next(GFLWOR.java:78) at org.basex.query.MainModule$1.next(MainModule.java:98) at org.basex.core.cmd.AQuery.query(AQuery.java:91) at org.basex.core.cmd.XQuery.run(XQuery.java:22) at org.basex.core.Command.run(Command.java:329) at org.basex.core.Command.execute(Command.java:94) at org.basex.server.LocalSession.execute(LocalSession.java:121) at org.basex.server.Session.execute(Session.java:37) at org.basex.core.CLI.execute(CLI.java:106) at org.basex.BaseX.init(BaseX.java:123) at org.basex.BaseX.main(BaseX.java:42) On Thu, Oct 30, 2014 at 5:54 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, you have been close! It could work with the following query (I haven't tried it out, though): _ get_query_result.xq declare variable $n external; declare option output:item-separator #xa;; let $aList := for $name in db:list() let $db := db:open($name) return xquery:eval($n, map { '': $db }) return distinct-values($aList) __ In this code, I'm opening the database in the main loop, and I then bind it to the empty string. This way, the database will be the context of the query to be evaluated query, and you won't have to deal with bugs that arise from the concatenation of db:open and the query string. 1. Can we assign dynamic values as a value to a map's key ? 2. Can I map have more than one key, in query:eval ? This is both possible. As you see in the following query, you'll again have to declare the variables that you want to bind. I agree this causes a lot of code, so we may simplify it again in a future version of BaseX: __ let $n := /a/b/c for $db in db:list() let $query := declare variable $db external; || db:open($db) || $n return xquery:eval($query, map { 'db': $db, 'query': $n }) __ Best, Christian -- - Mansi -- - Mansi -- - Mansi