Re: [basex-talk] Runtime Exception
Upgraded to latest basex version. Still getting such runtime errors: HTTP/1.1 400 Bad Request Content-Type: text/plain;charset=UTF-8^M Content-Length: 3222 Server: Jetty(8.1.18.v20150929) Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 8.4.4 Java: Oracle Corporation, 1.7.0_95 OS: Linux, amd64 Stack Trace: java.lang.NullPointerException at org.basex.data.DiskData.write(DiskData.java:120) at org.basex.data.DiskData.close(DiskData.java:140) at org.basex.core.Datas.unpin(Datas.java:53) at org.basex.core.cmd.Close.close(Close.java:45) at org.basex.query.QueryResources.close(QueryResources.java:108) at org.basex.query.QueryContext.close(QueryContext.java:603) at org.basex.query.QueryProcessor.close(QueryProcessor.java:262) at org.basex.core.cmd.AQuery.query(AQuery.java:99) at org.basex.core.cmd.XQuery.run(XQuery.java:22) at org.basex.core.Command.run(Command.java:398) at org.basex.http.rest.RESTCmd.run(RESTCmd.java:99) at org.basex.http.rest.RESTQuery.query(RESTQuery.java:74) at org.basex.http.rest.RESTRun.run0(RESTRun.java:41) at org.basex.http.rest.RESTCmd.run(RESTCmd.java:65) at org.basex.core.Command.run(Command.java:398) at org.basex.core.Command.execute(Command.java:100) at org.basex.core.Command.execute(Command.java:123) at org.basex.http.rest.RESTServlet.run(RESTServlet.java:22) at org.basex.http.BaseXServlet.service(BaseXServlet.java:64) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:231) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) On Fri, Apr 29, 2016 at 1:49 PM, Christian Grün <christian.gr...@gmail.com> wrote: > Hi Mansi, > > This error shouldn’t show up anymore with more recent versions of > BaseX. Could you try the latest version? > > Regarding the new error, could you please tell us more about what you > were doing with your data? Did you read and write at the same time? > Did you use different BaseX instances to access the data in parallel? > > Thanks > Christian > > > On Fri, Apr 29, 2016 at 6:28 PM, Mansi Sheth <mansi.sh...@gmail.com> > wrote: > > Hello, > > > > So, now I am stuck. I am not even able to access any database: > > > > ubuntu@ip-10-0-0-83:~$ basex > > BaseX 8.2.3 [Standalone] > > Try help to get more information. > > > >> list > > Improper use? Potential bug? Your feedback is welcome: > > Contact: basex-talk@mailman.uni-konstanz.de > > Version: BaseX 8.2.3 > > Java: Oracle Corporation, 1.7.0_95 > > OS: Linux, amd64 > > Stack Trace: > > java.lang.ArrayIndexOutOfBoundsException: 0 > > at org.basex.util.Version.(Version.java:33) > > at org.basex.util.Version.(Version.java:24) > > at org.basex.dat
Re: [basex-talk] Runtime Exception
Hello, So, now I am stuck. I am not even able to access any database: ubuntu@ip-10-0-0-83:~$ basex BaseX 8.2.3 [Standalone] Try help to get more information. > list Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 8.2.3 Java: Oracle Corporation, 1.7.0_95 OS: Linux, amd64 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 0 at org.basex.util.Version.(Version.java:33) at org.basex.util.Version.(Version.java:24) at org.basex.data.MetaData.read(MetaData.java:315) at org.basex.data.MetaData.read(MetaData.java:262) at org.basex.core.cmd.List.list(List.java:83) at org.basex.core.cmd.List.run(List.java:52) at org.basex.core.Command.run(Command.java:398) at org.basex.core.Command.execute(Command.java:100) at org.basex.api.client.LocalSession.execute(LocalSession.java:132) at org.basex.api.client.Session.execute(Session.java:36) at org.basex.core.CLI.execute(CLI.java:103) at org.basex.core.CLI.execute(CLI.java:87) at org.basex.BaseX.console(BaseX.java:191) at org.basex.BaseX.(BaseX.java:166) at org.basex.BaseX.main(BaseX.java:42) > On Tue, Apr 26, 2016 at 12:14 PM, Christian Grün <christian.gr...@gmail.com> wrote: > Hi Mansi, > > Thanks for the feedback. Errors like this sometime occur if databases > are requested from different JVMs at the same time. See e.g. [1] for > more information. > > Cheers > Christian > > [1] http://docs.basex.org/wiki/Startup#Concurrent_Operations > > > On Tue, Apr 26, 2016 at 6:02 PM, Mansi Sheth <mansi.sh...@gmail.com> > wrote: > > I did try the inspect command on all databases, which should there are no > > inconsistencies.I was logging any exceptions, in my java code, in case of > > errors, and that showed me, which database in particular was in problem, > > dropping it helped. > > > > This was the first time I saw it. I was worried, DB has grown till the > point > > of not being supported, when I actually panicked. > > > > Thanks, > > - Mansi > > > > On Tue, Apr 26, 2016 at 3:27 AM, Christian Grün < > christian.gr...@gmail.com> > > wrote: > >> > >> Dear Mansi, > >> > >> you could try to run the INSPECT command on the affected database, or > all > >> databases, in order to find out if your database has gone corrupt. Did > you > >> repeatedly come across this error? > >> > >> Best, > >> Christian > >> > >> Am 25.04.2016 16:45 schrieb "Mansi Sheth" <mansi.sh...@gmail.com>: > >> > > >> > Hello, > >> > > >> > My current BaseXDB is at 920GB, with ~230 databases... I run jetty > >> > server visa basexhttp script with giving it explicit 30GB of RAM. > While > >> > trying to access a query, thru REST api via XQUERY, I get below error. > >> > > >> > HTTP/1.1 400 Bad Request^M > >> > Content-Type: text/plain;charset=UTF-8^M > >> > Content-Length: 4207^M > >> > Server: Jetty(8.1.16.v20140903)^M > >> > ^M > >> > Improper use? Potential bug? Your feedback is welcome: > >> > Contact: basex-talk@mailman.uni-konstanz.de > >> > Version: BaseX 8.2.3 > >> > Java: Oracle Corporation, 1.7.0_95 > >> > OS: Linux, amd64 > >> > Stack Trace: > >> > java.lang.RuntimeException: Data Access out of bounds: > >> > - pre value: 126882320 > >> > - #used blocks: 495643 > >> > - #total locks: 495643 > >> > - access: 495642 (495643 > 495642] > >> > at org.basex.util.Util.notExpected(Util.java:60) > >> > at > >> > org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:458) > >> > at > >> > org.basex.io.random.TableDiskAccess.read1(TableDiskAccess.java:148) > >> > at org.basex.data.Data.kind(Data.java:306) > >> > at org.basex.query.value.node.DBNode.(DBNode.java:51) > >> > at > org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:68) > >> > at > org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:22) > >> > at org.basex.query.value.seq.Seq$1.next(Seq.java:77) > >> > at org.basex.query.expr.path.IterPath$1.next(IterPath.java:58) > >> > at org.basex.query.expr.path.IterPath$1.next(IterPath.java:36) > >> > at org.basex.query.MainModule$1.next(MainModule.java:114) > >> > at > >> > org.basex.query.func.StandardFunc.cache(StandardFunc.java:384) > >> > at >
Re: [basex-talk] Runtime Exception
I did try the inspect command on all databases, which should there are no inconsistencies.I was logging any exceptions, in my java code, in case of errors, and that showed me, which database in particular was in problem, dropping it helped. This was the first time I saw it. I was worried, DB has grown till the point of not being supported, when I actually panicked. Thanks, - Mansi On Tue, Apr 26, 2016 at 3:27 AM, Christian Grün <christian.gr...@gmail.com> wrote: > Dear Mansi, > > you could try to run the INSPECT command on the affected database, or all > databases, in order to find out if your database has gone corrupt. Did you > repeatedly come across this error? > > Best, > Christian > > Am 25.04.2016 16:45 schrieb "Mansi Sheth" <mansi.sh...@gmail.com>: > > > > Hello, > > > > My current BaseXDB is at 920GB, with ~230 databases... I run jetty > server visa basexhttp script with giving it explicit 30GB of RAM. While > trying to access a query, thru REST api via XQUERY, I get below error. > > > > HTTP/1.1 400 Bad Request^M > > Content-Type: text/plain;charset=UTF-8^M > > Content-Length: 4207^M > > Server: Jetty(8.1.16.v20140903)^M > > ^M > > Improper use? Potential bug? Your feedback is welcome: > > Contact: basex-talk@mailman.uni-konstanz.de > > Version: BaseX 8.2.3 > > Java: Oracle Corporation, 1.7.0_95 > > OS: Linux, amd64 > > Stack Trace: > > java.lang.RuntimeException: Data Access out of bounds: > > - pre value: 126882320 > > - #used blocks: 495643 > > - #total locks: 495643 > > - access: 495642 (495643 > 495642] > > at org.basex.util.Util.notExpected(Util.java:60) > > at > org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:458) > > at > org.basex.io.random.TableDiskAccess.read1(TableDiskAccess.java:148) > > at org.basex.data.Data.kind(Data.java:306) > > at org.basex.query.value.node.DBNode.(DBNode.java:51) > > at org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:68) > > at org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:22) > > at org.basex.query.value.seq.Seq$1.next(Seq.java:77) > > at org.basex.query.expr.path.IterPath$1.next(IterPath.java:58) > > at org.basex.query.expr.path.IterPath$1.next(IterPath.java:36) > > at org.basex.query.MainModule$1.next(MainModule.java:114) > > at org.basex.query.func.StandardFunc.cache(StandardFunc.java:384) > > at > org.basex.query.func.xquery.XQueryEval.eval(XQueryEval.java:129) > > at > org.basex.query.func.xquery.XQueryEval.eval(XQueryEval.java:59) > > at > org.basex.query.func.xquery.XQueryEval.value(XQueryEval.java:49) > > at org.basex.query.expr.gflwor.GFLWOR.value(GFLWOR.java:77) > > at org.basex.query.QueryContext.value(QueryContext.java:421) > > at org.basex.query.expr.gflwor.Let$LetEval.next(Let.java:187) > > at org.basex.query.expr.gflwor.GFLWOR$1.next(GFLWOR.java:95) > > at org.basex.query.MainModule$1.next(MainModule.java:114) > > at org.basex.core.cmd.AQuery.query(AQuery.java:91) > > at org.basex.core.cmd.XQuery.run(XQuery.java:22) > > at org.basex.core.Command.run(Command.java:398) > > at org.basex.http.rest.RESTCmd.run(RESTCmd.java:99) > > at org.basex.http.rest.RESTQuery.query(RESTQuery.java:74) > > at org.basex.http.rest.RESTRun.run0(RESTRun.java:41) > > at org.basex.http.rest.RESTCmd.run(RESTCmd.java:65) > > at org.basex.core.Command.run(Command.java:398) > > at org.basex.core.Command.execute(Command.java:100) > > at org.basex.core.Command.execute(Command.java:123) > > at org.basex.http.rest.RESTServlet.run(RESTServlet.java:22) > > at org.basex.http.BaseXServlet.service(BaseXServlet.java:64) > > at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) > > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) > > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503) > > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) > > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) > > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:
[basex-talk] Runtime Exception
Hello, My current BaseXDB is at 920GB, with ~230 databases... I run jetty server visa basexhttp script with giving it explicit 30GB of RAM. While trying to access a query, thru REST api via XQUERY, I get below error. HTTP/1.1 400 Bad Request^M Content-Type: text/plain;charset=UTF-8^M Content-Length: 4207^M Server: Jetty(8.1.16.v20140903)^M ^M Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 8.2.3 Java: Oracle Corporation, 1.7.0_95 OS: Linux, amd64 Stack Trace: java.lang.RuntimeException: *Data Access out of bounds:* *- pre value: 126882320* *- #used blocks: 495643* *- #total locks: 495643* *- access: 495642 (495643 > 495642]* at org.basex.util.Util.notExpected(Util.java:60) at org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:458) at org.basex.io.random.TableDiskAccess.read1(TableDiskAccess.java:148) at org.basex.data.Data.kind(Data.java:306) at org.basex.query.value.node.DBNode.(DBNode.java:51) at org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:68) at org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:22) at org.basex.query.value.seq.Seq$1.next(Seq.java:77) at org.basex.query.expr.path.IterPath$1.next(IterPath.java:58) at org.basex.query.expr.path.IterPath$1.next(IterPath.java:36) at org.basex.query.MainModule$1.next(MainModule.java:114) at org.basex.query.func.StandardFunc.cache(StandardFunc.java:384) at org.basex.query.func.xquery.XQueryEval.eval(XQueryEval.java:129) at org.basex.query.func.xquery.XQueryEval.eval(XQueryEval.java:59) at org.basex.query.func.xquery.XQueryEval.value(XQueryEval.java:49) at org.basex.query.expr.gflwor.GFLWOR.value(GFLWOR.java:77) at org.basex.query.QueryContext.value(QueryContext.java:421) at org.basex.query.expr.gflwor.Let$LetEval.next(Let.java:187) at org.basex.query.expr.gflwor.GFLWOR$1.next(GFLWOR.java:95) at org.basex.query.MainModule$1.next(MainModule.java:114) at org.basex.core.cmd.AQuery.query(AQuery.java:91) at org.basex.core.cmd.XQuery.run(XQuery.java:22) at org.basex.core.Command.run(Command.java:398) at org.basex.http.rest.RESTCmd.run(RESTCmd.java:99) at org.basex.http.rest.RESTQuery.query(RESTQuery.java:74) at org.basex.http.rest.RESTRun.run0(RESTRun.java:41) at org.basex.http.rest.RESTCmd.run(RESTCmd.java:65) at org.basex.core.Command.run(Command.java:398) at org.basex.core.Command.execute(Command.java:100) at org.basex.core.Command.execute(Command.java:123) at org.basex.http.rest.RESTServlet.run(RESTServlet.java:22) at org.basex.http.BaseXServlet.service(BaseXServlet.java:64) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) -- - Mansi
[basex-talk] XQERY Help
Hello, I need help with hopefully a simple XQUERY. I want to extend below XQUERY to run not on all documents in all databases, but only on those documents which contains "input string X" in its db:path (i.e. its original file name). Any help appreciated. declare variable $n as xs:string external; (: command line query, to be entered as "n" variable :) declare option output:item-separator ""; (: Each element would be in new line :) (: Run input query, on every XML document in every Database:) let $queryData := for $db in db:list() (: Assign dynamic variables to generate query, to be used in eval :) let $query := "declare variable $db external; " || "db:open($db)" || $n return xquery:eval($query,map { 'db': $db, 'query': $n }) return distinct-values($queryData) - Mansi
Re: [basex-talk] Guidance on Indexing
Thanks Christian as always was a quick and detailed response. 1. I am not 100% clear, if you are motivating me towards or against FULLTEXT indexing :) 2. Yes I am dealing with GBs of XML files. I create new Databases, using JAVA API using CreateDB class. Should I be using MainOptions to set AUTOOPTIMIZE and UPDINDEX options before each new db creation ? In MainOptions class, I didn't find any auto optimize option, am I missing something ? Since, I am anyways setting options thru this method, should I also set FTINDEX or ATTRINDEX (based on your response 1) attribute as well, before creating each DB ? I would hate to run optimization script after each DB update (updates happens daily). Please advice, - Mansi On Sun, Jan 3, 2016 at 4:52 PM, Christian Grünwrote: > Hi Mansi, > > > 1. Most of my xqueries are of below nature > > > > '/Archives/descendant::apiCalls[contains(@name,"com.sun")]/@name', where > > apiCalls could be 3-4 level under 'Archives'. Xqueries are accessed via > REST > > The existing index structures won’t allow you to look for arbitrary > sub strings; see [1] for more information. > > You are right, the full-text index may be a possibly way out. Prefix > searches can be realized via the "using wildcards" option [2]: > > //*[text() contains text "abc.*" using wildcards > > Please note that the query string will always be "tokenized": if you > are looking for "com.sun", you will also get results like "COM SUN!". > > > 2. I have 1000s of documents, spanning over 100 XML DB, with total space > > around 400 GB currently. Each query is taking roughly 30 mins, to run. > > > > My concern is, at each DB update, I am using attribute indexing, but info > > command on basex prompt tells me otherwise. Am I misreading something ? > Is > > there a way to fix this once DB is created ? Its takes me 48 hours, to > > create DBs from scratch... :) > > If UPDINDEX and AUTOOPTIMIZE is false, you will need to call > "OPTIMIZE" after your updates. > > If you create a new database, you can set UPDINDEX and AUTOOPTIMIZE to > true. However, AUTOOPTIMIZE will get incredibly slow if you are > working with gigabytes of XML data. > > > Reading thru UPDINDEX and AUTOOPTIMIZE ALL commands, tells me to open > each > > DB and run these commands. Is that my option ? Do we have a xquery script > > somewhere which I can use to do this ? > > If your databases are called "db1" ... "db100", the following XQuery > script will optimize all those databases: > > for $i in 1 to 100 > return db:optimize('db' || $i) > > You can also create a command script [3] with XQuery: > > { > for $i in 1 to 100 > return ( > { 'db' || $i }, > > ) > } > > You can store the result as a .bxs file and run it afterwards. > > Before you create all index structures, you should probably run your > queries on some smaller database instances and check out the "Query > Info" panel in the GUI. It will tell you if an index is used or not. > > Best, > Christian > > [1] http://docs.basex.org/wiki/Indexes#Value_Indexes > [2] http://docs.basex.org/wiki/Full-Text#Match_Options > [3] http://docs.basex.org/wiki/Commands#Command_Scripts > -- - Mansi
[basex-talk] Guidance on Indexing
Hello, A very happy new year to all of you !!! I have some very basic questions with indexing. 1. Most of my xqueries are of below nature '/Archives/descendant::apiCalls[contains(@name,"com.sun")]/@name', where apiCalls could be 3-4 level under 'Archives'. Xqueries are accessed via REST Based on this, I used attribute indexing, after each update to DB. Am I correct ? Should I have been using fulltext indexing instead ? Why ? 2. I have 1000s of documents, spanning over 100 XML DB, with total space around 400 GB currently. Each query is taking roughly 30 mins, to run. Though expectable performance, but I know I can do better with indexing. Currently, when I looked at one of the DBs, > open bi_output_3 Database 'bi_output_3' was opened in 38.22 ms. > info db Database Properties Name: bi_output_3 Size: 3938 MB Nodes: 16193129 Documents: 35 Binaries: 0 Timestamp: 2016-01-03T13:40:40.000Z Resource Properties Timestamp: 2016-01-03T13:40:40.776Z Encoding: UTF-8 CHOP: true Indexes Up-to-date: false TEXTINDEX: false ATTRINDEX: false FTINDEX: false LANGUAGE: English STEMMING: false CASESENS: false DIACRITICS: false STOPWORDS: UPDINDEX: false AUTOOPTIMIZE: false MAXCATS: 100 MAXLEN: 96 When looked at its HDD footprint: ubuntu@/BaseXDB/bi_output_3$ ls -l total 4032992 -rw-rw-r-- 1 ubuntu ubuntu 2209449064 Jan 1 17:00 atv.basex -rw-rw-r-- 1 ubuntu ubuntu 4 Jan 1 16:35 atvl.basex -rw-rw-r-- 1 ubuntu ubuntu 0 Jan 1 16:35 atvr.basex -rw-rw-r-- 1 ubuntu ubuntu 6414 Jan 3 13:40 doc.basex -rw-rw-r-- 1 ubuntu ubuntu 6 Jan 1 17:00 ftxx.basex -rw-rw-r-- 1 ubuntu ubuntu 0 Jan 1 17:00 ftxy.basex -rw-rw-r-- 1 ubuntu ubuntu 0 Jan 1 17:00 ftxz.basex -rw-rw-r-- 1 ubuntu ubuntu829 Jan 3 13:40 inf.basex -rw-rw-r-- 1 ubuntu ubuntu 28 Jan 1 17:00 swl.basex -rw-rw-r-- 1 ubuntu ubuntu 1916444672 Jan 3 13:40 tbl.basex -rw-rw-r-- 1 ubuntu ubuntu3796037 Jan 3 13:40 tbli.basex -rw-rw-r-- 1 ubuntu ubuntu 45462 Jan 1 17:00 txt.basex -rw-rw-r-- 1 ubuntu ubuntu 4 Jan 1 16:35 txtl.basex -rw-rw-r-- 1 ubuntu ubuntu 0 Jan 1 16:35 txtr.basex ubuntu@/BaseXDB/bi_output_3$ pwd /veracode/msheth/BaseXDB/bi_output_3 ubuntu@/BaseXDB/bi_output_3$ My concern is, at each DB update, I am using attribute indexing, but info command on basex prompt tells me otherwise. Am I misreading something ? Is there a way to fix this once DB is created ? Its takes me 48 hours, to create DBs from scratch... :) Reading thru UPDINDEX and AUTOOPTIMIZE ALL commands, tells me to open each DB and run these commands. Is that my option ? Do we have a xquery script somewhere which I can use to do this ? Thanks, - Mansi
Re: [basex-talk] [bxerr:BXDB0002] Too many open files
Thanks Joe for your input. I haven't tried all the options yet, but will surely go thru. I guess, what I was trying to see is, if there is a way I can optimize my XQUERIES, to close open databases which it not longer needs. Currently, my queries are something of below nature. I am thinking, if there is a better way to deal with *bold* piece of code below. declare variable $n as xs:string external; declare option output:item-separator ""; let $queryData := for $db in db:list() *let $query := "declare variable $db external; " || "db:open($db)" || $n* *return xquery:eval($query,map { 'db': $db, 'query': $n })* return distinct-values($queryData) On Mon, Dec 7, 2015 at 1:30 PM, Joe Wicentowski <joe...@gmail.com> wrote: > Hi Mansi, > > The results of ulimit can be misleading. See this article - which really > helped me when I encountered this issue (though not with BaseX): > > > https://underyx.me/2015/05/18/raising-the-maximum-number-of-file-descriptors.html > > Joe > > On Mon, Dec 7, 2015 at 1:22 PM, Mansi Sheth <mansi.sh...@gmail.com> wrote: > >> Thanks Christian, >> >> I had already set open files limit on the OS: >> >> ubuntu@ip-10-0-0-83:~$ ulimit -Hn >> >> >> >> However, I still face exact same problem. Process breaks at the same db >> count >> >> [bxerr:BXDB0002] Resource >> "/veracode/msheth/BaseXDB/bi_output_715/inf.basex (Too many open files)" >> not found. >> >> On Sat, Dec 5, 2015 at 7:53 AM, Christian Grün <christian.gr...@gmail.com >> > wrote: >> >>> Hi Mansi, >>> >>> If you are working with Linux, you may need to increase the maximum >>> file limit with "ulimit -n" [1]. >>> >>> Hope this helps, >>> Christian >>> >>> [1] http://www.linuxhowtos.org/Tips%20and%20Tricks/ulimit.htm >>> >>> >>> >>> On Fri, Dec 4, 2015 at 7:52 PM, Mansi Sheth <mansi.sh...@gmail.com> >>> wrote: >>> > Hello, >>> > >>> > I am importing BaseX, with tons of XML files. Currently I have roughly >>> 1600 >>> > databases, I am starting basexhttp service, to access it over a web >>> service >>> > endpoint, thru a xquery file. Using BaseX 8.2.3. >>> > >>> > I am receiving below error: >>> > >>> > [bxerr:BXDB0002] Resource >>> "/veracode/msheth/BaseXDB/bi_output_713/inf.basex >>> > (Too many open files)" not found. >>> > >>> > basexhttp, is running with 10240M virtual memory. >>> > >>> > I can share the xquery file, if thats needed. >>> > >>> > Has anyone experienced this before ? Is there a limit on no of >>> databases >>> > supported by BaseX ? Is there some configuration option, which I can >>> use to >>> > close already queried database ? >>> > >>> > Thanks, >>> > - Mansi >>> >> >> >> >> -- >> - Mansi >> > > -- - Mansi
Re: [basex-talk] [bxerr:BXDB0002] Too many open files
Thanks Christian, I had already set open files limit on the OS: ubuntu@ip-10-0-0-83:~$ ulimit -Hn However, I still face exact same problem. Process breaks at the same db count [bxerr:BXDB0002] Resource "/veracode/msheth/BaseXDB/bi_output_715/inf.basex (Too many open files)" not found. On Sat, Dec 5, 2015 at 7:53 AM, Christian Grün <christian.gr...@gmail.com> wrote: > Hi Mansi, > > If you are working with Linux, you may need to increase the maximum > file limit with "ulimit -n" [1]. > > Hope this helps, > Christian > > [1] http://www.linuxhowtos.org/Tips%20and%20Tricks/ulimit.htm > > > > On Fri, Dec 4, 2015 at 7:52 PM, Mansi Sheth <mansi.sh...@gmail.com> wrote: > > Hello, > > > > I am importing BaseX, with tons of XML files. Currently I have roughly > 1600 > > databases, I am starting basexhttp service, to access it over a web > service > > endpoint, thru a xquery file. Using BaseX 8.2.3. > > > > I am receiving below error: > > > > [bxerr:BXDB0002] Resource > "/veracode/msheth/BaseXDB/bi_output_713/inf.basex > > (Too many open files)" not found. > > > > basexhttp, is running with 10240M virtual memory. > > > > I can share the xquery file, if thats needed. > > > > Has anyone experienced this before ? Is there a limit on no of databases > > supported by BaseX ? Is there some configuration option, which I can use > to > > close already queried database ? > > > > Thanks, > > - Mansi > -- - Mansi
[basex-talk] [bxerr:BXDB0002] Too many open files
Hello, I am importing BaseX, with tons of XML files. Currently I have roughly 1600 databases, I am starting basexhttp service, to access it over a web service endpoint, thru a xquery file. Using BaseX 8.2.3. I am receiving below error: [bxerr:BXDB0002] Resource "/veracode/msheth/BaseXDB/bi_output_713/inf.basex (Too many open files)" not found. basexhttp, is running with 10240M virtual memory. I can share the xquery file, if thats needed. Has anyone experienced this before ? Is there a limit on no of databases supported by BaseX ? Is there some configuration option, which I can use to close already queried database ? Thanks, - Mansi
[basex-talk] Basex 8.2.3 data not shown, until OS restart
Hello, This is something very weird. I import bunch of XML files, into this latest version of based using Java API. When trying to access this data (via basex command line client or REST), doesn't show any databases at all. These databases shows up only after restarting OS. Is this a known issues ? or just me ? - Mansi
Re: [basex-talk] Finding document based on filename
Thanks guys for all expert comments. Currently, I am going experimenting performance with just deleting and inserting using Java API. If this process takes a tiny bit longer, i don't really care is what I figured :) If i becomes unacceptable, I will use one of these suggestions. Thanks once again. StringList databases = List.list(context) ; String query = "" ; for(String database : databases ) { query = "db:list('" + database + "')" ; try { for (String fileName: query(query).split(" ")) { query = "db:delete('" + database + "','" + fileName + "')" ; if(fileName.contains(XMLFileName.split("_")[1])) { query(query) ; logger.info("Deleted " + fileName + " from " + database) ; retVal = true; break; } } } catch (BaseXException e) { e.printStackTrace(); } } On Mon, Aug 31, 2015 at 9:45 PM, Martín Ferrariwrote: > I forgot one thing, I got much better performance by just calling > replace rather than delete and insert, but this is a db with more than one > million records. If performance is not important, I believe either way will > do. > > Martín. > > -- > From: ferrari_mar...@hotmail.com > To: mansi.sh...@gmail.com; basex-talk@mailman.uni-konstanz.de > Date: Mon, 31 Aug 2015 16:35:33 + > Subject: Re: [basex-talk] Finding document based on filename > > > Hi Mansi, > I have a similar situation. I don't think there's a fast way to get > documents by only knowing a part of their names. It seems you need to know > the exact name. In my case, we might be able to group documents by a common > id, so we might create subfolders inside the DB and store/get the contents > of the subfolder directly, which is pretty fast. > I've also tried indexing, but insertions got really slow (I assume > maybe because indexing is not granular, it indexes all values) and we > need performance. > > Oh, I've also tried using starts-with() instead of contains(), but it > seems it does not pick up indexes. > > Martín. > > -- > Date: Fri, 28 Aug 2015 16:52:37 -0400 > From: mansi.sh...@gmail.com > To: basex-talk@mailman.uni-konstanz.de > Subject: [basex-talk] Finding document based on filename > > Hello, > > I would be having 100s of databases, with each database having 100 XML > documents. I want to devise an algorithm, where given a part of XML file > name, i want to know which database(s) contains it, or null if document is > not currently present in any database. Based on that, add current document > into the database. This is to always maintain latest version of a document > in DB, and remove the older version, while adding newer version. > > So far, only way I could come up with is: > > for $db in all-databases: > open $db > $fileNames = list $db > for eachFileName in $fileNames: >if $eachFileName.contains(sub-xml filename): > add to ret-list-db > > return ret-list-db > > Above algorithm, seems highly inefficient, Is there any indexing, which > can be done ? Do you suggest, for each document insert, I should maintain a > separate XML document, which lists each file inserted etc. > > Once, i get hold of above list of db, I would be eventually deleting that > file and inserting a latest version of that file(which would have same > sub-xml file name). So, constant updating of this external document also > seems painful (Map be ?). > > Also, would it be faster, using XQUERY script files, thru java code, or > using Java API for such operations ? > > How do you all deal with such operations ? > > - Mansi > -- - Mansi
[basex-talk] Finding document based on filename
Hello, I would be having 100s of databases, with each database having 100 XML documents. I want to devise an algorithm, where given a part of XML file name, i want to know which database(s) contains it, or null if document is not currently present in any database. Based on that, add current document into the database. This is to always maintain latest version of a document in DB, and remove the older version, while adding newer version. So far, only way I could come up with is: for $db in all-databases: open $db $fileNames = list $db for eachFileName in $fileNames: if $eachFileName.contains(sub-xml filename): add to ret-list-db return ret-list-db Above algorithm, seems highly inefficient, Is there any indexing, which can be done ? Do you suggest, for each document insert, I should maintain a separate XML document, which lists each file inserted etc. Once, i get hold of above list of db, I would be eventually deleting that file and inserting a latest version of that file(which would have same sub-xml file name). So, constant updating of this external document also seems painful (Map be ?). Also, would it be faster, using XQUERY script files, thru java code, or using Java API for such operations ? How do you all deal with such operations ? - Mansi
Re: [basex-talk] XQuery Optimization suggestions
As part of preparation of presenting at XML Prague, I am working on a slide showing statistics. From below comments, I started thinking, would it be best to show time taken against size of the DB or against no of nodes. What do you all think ? If I am thinking it from no of nodes basis, would it be a little better comparison with other tools ? For e.g. 1 million records in SQL database ~= 1 million nodes in BaseX, thus making closer to apples to apples comparison for time taken. We are currently, battling with this at work too. There are few different approaches for data mining, for different data sources. I talk in terms of GBs of data in database and SQL fans, talk in terms of millions of records. Its hard to make any progress and push for NXDs. - Mansi - Mansi On Sun, Jan 18, 2015 at 11:24 AM, Christian Grün christian.gr...@gmail.com wrote: Just finished processing 310GB of data, with result set worth 11 million records within 44 minutes. I am currently psyched with the potential of even BaseX supporting this kind of data. But I am no expert here. What are your views on this performance statistics ? My assumption is that it basically boils down to a sequential scan of most of the elements in the database (so buying faster SSDs will probably be the safest choice to speed up your queries..). 310 GB is a lot, so 44 minutes is probably not that bad. Speaking for myself, though, I was sometimes surprised that other NoSQL systems I tried were not really faster than BaseX, if you have hierarchical data structures, and if you need to post-process large amounts of data. However, as your queries look pretty simple, you could also have a look at e.g. MongoDB or RethinkDB (provided that the data can be converted to JSON). Those systems give you convenient Big Data features like distribution/sharding or replication. But I'm also interested what others say about this. Christian - Mansi On Sun, Jan 18, 2015 at 10:49 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, http://localhost:8984/rest?run=get_query.xqn=/Archives/*/descendant::c/descendant::a[contains(@name, xyz)]/@name/data() My guess is that most time is spent to parse all the nodes in the database. If you know more about the database structure, you could replace some of the descendant with explicit child steps. Apart from that, I guess I'm repeating myself, but have you tried to remove duplicates in XQuery, or do grouping and sorting in the language? Usually, it's recommendable to do as much as possible in XQuery itself (although it might not be obvious how to do this at first glance). Christian -- - Mansi -- - Mansi
[basex-talk] XQuery Optimization suggestions
Hello, I am doing some performance analysis on size of XML files in DB, no of records in a result set and how much time it takes to get me results. Currently, I have 150GB worth of XML documents imported into BaseXDB. It took roughly 21 minutes to return back result set worth 5.3 million records. Queries are of below form: http://localhost:8984/rest?run=get_query.xqn=/Archives/*/descendant::c/descendant::a[contains(@name, xyz)]/@name/data() XQUERY File: for $db in db:list() (: Assign dynamic variables to generate query, to be used in eval :) let $query := declare variable $db external; || db:open($db) || $n return xquery:eval($query,map { 'db': $db, 'query': $n }) I have been few questions around this. 1. I have been routinely advices on this email chain, to avoid serialization from XPATH and let xquery handle it. I tried a few things like replacing strings() with data(), adding serialization option on REST call, in XQUERY file etc. But, I don't see any performance gain. Is there something else I can try or something, I am doing wrong ? 2. Does anyone have any resource to compare this performance to other NoSQL databases. I am just very curious, how above performance numbers compares to other DBs ? - Mansi
Re: [basex-talk] XQuery Optimization suggestions
Structure of data is nested, so I have to write queries this way unfortunately. Also, I am doing performance analysis removing all external parameters like any kind of post-processing, network latency etc. Just isolating if I can do any better. So, guess this is the best I can do... No problem at all. Just finished processing 310GB of data, with result set worth 11 million records within 44 minutes. I am currently psyched with the potential of even BaseX supporting this kind of data. But I am no expert here. What are your views on this performance statistics ? - Mansi On Sun, Jan 18, 2015 at 10:49 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, http://localhost:8984/rest?run=get_query.xqn=/Archives/*/descendant::c/descendant::a[contains(@name, xyz)]/@name/data() My guess is that most time is spent to parse all the nodes in the database. If you know more about the database structure, you could replace some of the descendant with explicit child steps. Apart from that, I guess I'm repeating myself, but have you tried to remove duplicates in XQuery, or do grouping and sorting in the language? Usually, it's recommendable to do as much as possible in XQuery itself (although it might not be obvious how to do this at first glance). Christian -- - Mansi
Re: [basex-talk] Silly XQUERY exception
Lukas, That was it!! I feel like shooting myself. What an oversight. Thanks a ton for looking at it and spotting it. - Mansi On Thu, Jan 8, 2015 at 2:44 AM, Lukas Kircher lukaskirch...@gmail.com wrote: Hi Mansi, let $cmd := /A/*/descendant::C/*descandant*::*[contains(@name,'|| $n ||')]” Just a quick scan - I marked the problem in bold above - I would try ‘desc*e*ndant’ instead of 'desc*a*ndant’. Cheers, Lukas -- - Mansi
[basex-talk] Silly XQUERY exception
Hello, I feel very stupid and frustrated, not able to fix this error: Below, is a my query code, which I am trying to run. I am passing the value for contains clause thru command line and I expect to receive # of xml files matching $cmd xpath. I always get Stopped at /veracode/msheth/BaseXWeb/get_prevalence.xq, 16/12: *[XPST0003] Expecting ':=', found ':'.* When ran thru REST: *[XPST0003] Expecting valid step, found 'd'.* I think, its the way $cmd is being set. I have tried, simple string concat using ||, using concat command, using html entities etc. declare variable $n as xs:string external; declare option output:item-separator #xa;; (: let $cmd := concat(/A/*/descendant::C/descandant::*[contains(@name,,$singlequote,$n,$singlequote,)]) :) *let $cmd := /A/*/descendant::C/descandant::*[contains(@name,'|| $n ||')]* let $aPath := for $db in db:list() let $query := declare variable $db external; || db:open($db) || $cmd *return xquery:eval($query,* *map { 'db': $db, 'query': $cmd })* let $clients := for $elem in $aPath return db:path($elem) return $n , distinct-values(count($clients)) Lines of code, which are the culprit, are marked in bold above. Any and all suggestions are greatly appreciated. - Mansi
[basex-talk] Design finding almost duplicate xml files
Hello, I am trying to come up with a design, which just before insert a xml file into database, will warn us, that there is almost an identical xml file (with different name and different size) already stored in the database. Almost identical would be based on few elements of xml file such as: root A name= B name= C name=/ C name=/ . . /B /A . . . /root A and B from above snippet but different C. Element A could be repeated 100s of time in single xml file. Any pointers ? - Mansi
Re: [basex-talk] Out Of Memory
Hello, Wanted to get back to this email chain and share my experience. I got this running beautifully (including all post processing of results), using the below command: curl -ig ' http://localhost:8984/rest?run=get_query.xqn=/Archives/*/descendant::D/@name/string()' | cut -d: -f1 | cut -d. -f1-3 | sort | uniq -c | sort -n -r I am using Basex 8.0 beta 763cc93 build. Running this on i7 2.7GHZ MBP, giving 8GB to basexhttp process. it took around 34 min on a 41 GB data. I think, lot of time went in post processing (sorting) the result set, rather than actually extracting the results from BaseX DB. When tried a similar query on a much smaller database(3GB) on a much powerful amazon instance, giving 20GB RAM to basex http process, got me results with post processing within 4 mins. Thanks for all your inputs guys, Keep BaseXing... !!! - Mansi On Fri, Nov 7, 2014 at 12:25 PM, Mansi Sheth mansi.sh...@gmail.com wrote: This email chain, is extremely helpful. Thanks a ton guys. Certainly one of the most helpful folks here :) I have to try a lot of these suggestions but currently I am being pulled into something else, so I have to pause for the time being. Will get back to this email thread, after trying a few things and my relevant observations. - Mansi On Fri, Nov 7, 2014 at 3:48 AM, Fabrice Etanchaud fetanch...@questel.com wrote: Hi Mansi, From what I can see, for each pqr value, you could use db:attribute-range to retrieve all the file names, group by/count to obtain statistics. You could also create a new collection from an extraction of only the data you need, changing @name into element and use full text fuzzy match. Hoping it helps Cordialement Fabrice *De :* basex-talk-boun...@mailman.uni-konstanz.de [mailto: basex-talk-boun...@mailman.uni-konstanz.de] *De la part de* Mansi Sheth *Envoyé :* jeudi 6 novembre 2014 20:55 *À :* Christian Grün *Cc :* BaseX *Objet :* Re: [basex-talk] Out Of Memory I would be doing tons of post processing. I never use UI. I either use REST thru cURL or command line. I would basically need data in below format: XML File Name, @name I am trying to whitelist picking up values for only starts-with(@name,pqr). where pqr is a list of 150 odd values. My file names, are essentially some ID/keys, which I would need to map it further using sqlite to some values and may be group by it.. etc. So, basically I am trying to visualize some data, based on its existence in which xml files. So, yes count(query) would be fine, but won't solve much purpose, since I still need value pqr. - Mansi On Thu, Nov 6, 2014 at 11:19 AM, Christian Grün christian.gr...@gmail.com wrote: Query: /A/*//E/@name/string() In the GUI, all results will be cached, so you could think about switching to command line. Do you really need to output all results, or do you do some further processing with the intermediate results? For example, the query count(/A/*//E/@name/string()) will probably run without getting stuck. This query, was going OOM, within few mins. I tried a few ways, of whitelisting, with contain clause, to truncate the result set. That didn't help too. So, now I am out of ideas. This is giving JVM 10GB of dedicated memory. Once, above query works and doesn't go Out Of Memory, I also need corresponding file names too: XYZ.xml //E/@name PQR.xml //E/@name Let me know if you would need more details, to appreciate the issue ? - Mansi On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, I think we need more information on the queries that are causing the problems. Best, Christian On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.com wrote: Hello, I have a use case, where I have to extract lots in information from each XML in each DB. Something like, attribute values of most of the nodes in an XML. For such, queries based goes Out Of Memory with below exception. I am giving it ~12GB of RAM on i7 processor. Well I can't complain here since I am most definitely asking for loads of data, but is there any way I can get these kinds of data successfully ? mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp BaseX 8.0 beta b45c1e2 [Server] Server was started (port: 1984) HTTP Server was started (port: 8984) Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073) at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342) at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll
[basex-talk] data protection at rest
Hello, I am thinking about my options of protecting BaseXDB's data at rest. We are storing sensitive client data which we need to protect. We would be migrating BaseXDB on an amazon instance, so protection at rest would be a primary concern. What I wish to do is, while inserting data into the database, it should be encrypted and while using XQUERY, we should have a mechanism to decrypt the data and retrieve the information needed. I am aware of the performance hit here. Would evaluate if its acceptable after I could collect some statistics. I looked at the docs: http://docs.basex.org/wiki/Cryptographic_Module#Encryption_.26_Decryption But, I didn't completely understand a use case for this example. Or if it would solve my purpose. I am currently using some Java code to insert files into the database. Has anyone done something on this line ? Please share some use cases. - Mansi
Re: [basex-talk] Distributed processing on roadmap ?
Sorry about the delay. I was busy preparing a presentation for my company as baseX being a our analytics solution. It was very well received. All thanks to you and everyone on this user list :) Based on my use cases, I believe (again I am no expert in this domain), map/reduce approach would work better. The result set being returned would contain maximum couple of thousand records with some post-processing on it, as compared to TBs of data being queried. If the querying and processing step could use processing power from clusters of nodes, may be we might get significant performance gain ? What are your thoughts ? What are other use cases, you come across ? - Mansi On Mon, Nov 17, 2014 at 10:50 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, it's nice to hear that you have been successfully scaling your database instances so far. I love using BaseX and the powers of BaseX. Currently I am able to query ~60GB of XML files under 2.5 mins. I still have a few more optimization a to try. I also do see this data increasing to a couple of TB shortly. I would love to see if this kind of processing is almost real time (within a min). So my question is there any discussions around supporting distributed processing or clusters of nodes etc ? Yes, distributed processing is a frequently discussed topic. One of our major questions is what challenge to solve first. As you surely know, there are so many different NoSQL stores out there, and all of them tackle different problems. Up to now, we spent most time on replication, but this would not give you better performance. So I would be interested to hear what kind of distribution techniques you believe would give you better performance. Do you think that a map/reduce approach would be helpful, or do you simply have lots of data that somehow needs to be sent to a client as quickly as possible? In other words, how large are your results sets? Do you really need the complete results, or would you rather like to draw some conclusions from the scanned data? Back to the current technology… Maybe you could do some Java profiling (using e.g. -Xrunhprof:cpu=samples) in order to find out what's the current bottleneck. Best, Christian -- - Mansi
Re: [basex-talk] Out Of Memory
This email chain, is extremely helpful. Thanks a ton guys. Certainly one of the most helpful folks here :) I have to try a lot of these suggestions but currently I am being pulled into something else, so I have to pause for the time being. Will get back to this email thread, after trying a few things and my relevant observations. - Mansi On Fri, Nov 7, 2014 at 3:48 AM, Fabrice Etanchaud fetanch...@questel.com wrote: Hi Mansi, From what I can see, for each pqr value, you could use db:attribute-range to retrieve all the file names, group by/count to obtain statistics. You could also create a new collection from an extraction of only the data you need, changing @name into element and use full text fuzzy match. Hoping it helps Cordialement Fabrice *De :* basex-talk-boun...@mailman.uni-konstanz.de [mailto: basex-talk-boun...@mailman.uni-konstanz.de] *De la part de* Mansi Sheth *Envoyé :* jeudi 6 novembre 2014 20:55 *À :* Christian Grün *Cc :* BaseX *Objet :* Re: [basex-talk] Out Of Memory I would be doing tons of post processing. I never use UI. I either use REST thru cURL or command line. I would basically need data in below format: XML File Name, @name I am trying to whitelist picking up values for only starts-with(@name,pqr). where pqr is a list of 150 odd values. My file names, are essentially some ID/keys, which I would need to map it further using sqlite to some values and may be group by it.. etc. So, basically I am trying to visualize some data, based on its existence in which xml files. So, yes count(query) would be fine, but won't solve much purpose, since I still need value pqr. - Mansi On Thu, Nov 6, 2014 at 11:19 AM, Christian Grün christian.gr...@gmail.com wrote: Query: /A/*//E/@name/string() In the GUI, all results will be cached, so you could think about switching to command line. Do you really need to output all results, or do you do some further processing with the intermediate results? For example, the query count(/A/*//E/@name/string()) will probably run without getting stuck. This query, was going OOM, within few mins. I tried a few ways, of whitelisting, with contain clause, to truncate the result set. That didn't help too. So, now I am out of ideas. This is giving JVM 10GB of dedicated memory. Once, above query works and doesn't go Out Of Memory, I also need corresponding file names too: XYZ.xml //E/@name PQR.xml //E/@name Let me know if you would need more details, to appreciate the issue ? - Mansi On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, I think we need more information on the queries that are causing the problems. Best, Christian On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.com wrote: Hello, I have a use case, where I have to extract lots in information from each XML in each DB. Something like, attribute values of most of the nodes in an XML. For such, queries based goes Out Of Memory with below exception. I am giving it ~12GB of RAM on i7 processor. Well I can't complain here since I am most definitely asking for loads of data, but is there any way I can get these kinds of data successfully ? mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp BaseX 8.0 beta b45c1e2 [Server] Server was started (port: 1984) HTTP Server was started (port: 8984) Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073) at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342) at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:744) -- - Mansi -- - Mansi -- - Mansi -- - Mansi
Re: [basex-talk] Dynamic Evaluation of XQUERY
Christian, I am running out of ideas in debugging this. When I directly execute this query within XQUERY file, its working perfectly. Just when I pass it thru command line, its breaking. Infact the actual .xq file also doesn't matter, as you pointed out, parsing from command line is broken. I tried -d switch and escaping spaces, but didn't help. Also, I tested, this is a valid XPATH query. Please pardon my XQUERY knowledge, its really not my background. - Mansi On Thu, Nov 6, 2014 at 8:45 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, ~/Downloads/basex/bin/basex -bn='/Archives/*//class[contains(@name,abc) and contains(@name,pqr)]' get_paths.xq Stopped at /Users/mansiadmin/Documents/Research-Projects/BigData, 1/4: [XPDY0002] and: no context value bound. It seems that and was interpreted as XPath step, so it seems as if something went wrong when parsing your query on command line (I doubt that it's something specific to BaseX). Maybe you can simply try to output the query that causes the error, instead of trying to evaluate it? Christian However, below query works as a charm: ~/Downloads/basex/bin/basex -bn='/Archives/*//class[contains(@name,abc)]' get_paths.xq I am hoping, for first query above, its some syntactic issue at my end. But, couldn't fix it, so thought should point out. Please advise. Code: declare variable $n as xs:string external; declare option output:item-separator #xa;; let $aPath := for $db in db:list() let $query := declare variable $db external; || db:open($db) || $n return xquery:eval($query, map { 'db': $db, 'query': $n }) let $paths := for $elem in $aPath return db:path($elem) return distinct-values($paths) On Mon, Nov 3, 2014 at 6:48 PM, Christian Grün christian.gr...@gmail.com wrote: …in the meanwhile, could you please check if the bug has possibly been fixed in the latest 8.0 snapshot [1]? [1] http://files.basex.org/releases/latest On Tue, Nov 4, 2014 at 12:46 AM, Christian Grün christian.gr...@gmail.com wrote: Improper use? Potential bug? Your feedback is welcome: Sounds like a little bug indeed; I will check it tomorrow! Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 7.9 Java: Oracle Corporation, 1.7.0_45 OS: Mac OS X, x86_64 Stack Trace: java.lang.NullPointerException at org.basex.query.value.item.Str.get(Str.java:49) at org.basex.query.func.FNDb.path(FNDb.java:489) at org.basex.query.func.FNDb.item(FNDb.java:128) at org.basex.query.expr.ParseExpr.iter(ParseExpr.java:45) at org.basex.query.func.FNDb.iter(FNDb.java:92) at org.basex.query.gflwor.GFLWOR$2.next(GFLWOR.java:78) at org.basex.query.MainModule$1.next(MainModule.java:98) at org.basex.core.cmd.AQuery.query(AQuery.java:91) at org.basex.core.cmd.XQuery.run(XQuery.java:22) at org.basex.core.Command.run(Command.java:329) at org.basex.core.Command.execute(Command.java:94) at org.basex.server.LocalSession.execute(LocalSession.java:121) at org.basex.server.Session.execute(Session.java:37) at org.basex.core.CLI.execute(CLI.java:106) at org.basex.BaseX.init(BaseX.java:123) at org.basex.BaseX.main(BaseX.java:42) On Thu, Oct 30, 2014 at 5:54 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, you have been close! It could work with the following query (I haven't tried it out, though): _ get_query_result.xq declare variable $n external; declare option output:item-separator #xa;; let $aList := for $name in db:list() let $db := db:open($name) return xquery:eval($n, map { '': $db }) return distinct-values($aList) __ In this code, I'm opening the database in the main loop, and I then bind it to the empty string. This way, the database will be the context of the query to be evaluated query, and you won't have to deal with bugs that arise from the concatenation of db:open and the query string. 1. Can we assign dynamic values as a value to a map's key ? 2. Can I map have more than one key, in query:eval ? This is both possible. As you see in the following query, you'll again have to declare the variables that you want to bind. I agree this causes a lot of code, so we may simplify it again in a future version of BaseX: __ let $n := /a/b/c for $db in db:list() let $query := declare variable $db external; || db:open($db) || $n return xquery:eval($query, map { 'db': $db, 'query': $n }) __ Best, Christian -- - Mansi -- - Mansi -- - Mansi
Re: [basex-talk] Out Of Memory
This would need a lot of details, so bear with me below: Briefly my XML files look like: A name= B name= C name= D name= E name=/ A can contain B, C or D and B, C or D can contain E. We have 1000s (currently 3000 in my test data set) of such xml files, of size 50MB on an average. Its tons of data ! Currently, my database is of ~18GB in size. Query: /A/*//E/@name/string() This query, was going OOM, within few mins. I tried a few ways, of whitelisting, with contain clause, to truncate the result set. That didn't help too. So, now I am out of ideas. This is giving JVM 10GB of dedicated memory. Once, above query works and doesn't go Out Of Memory, I also need corresponding file names too: XYZ.xml //E/@name PQR.xml //E/@name Let me know if you would need more details, to appreciate the issue ? - Mansi On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, I think we need more information on the queries that are causing the problems. Best, Christian On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.com wrote: Hello, I have a use case, where I have to extract lots in information from each XML in each DB. Something like, attribute values of most of the nodes in an XML. For such, queries based goes Out Of Memory with below exception. I am giving it ~12GB of RAM on i7 processor. Well I can't complain here since I am most definitely asking for loads of data, but is there any way I can get these kinds of data successfully ? mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp BaseX 8.0 beta b45c1e2 [Server] Server was started (port: 1984) HTTP Server was started (port: 8984) Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073) at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342) at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:744) -- - Mansi -- - Mansi
Re: [basex-talk] Out Of Memory
Interesting idea, I thought of using db partition, but didn't pursue it further, mainly due to below thought process. Currently, I am ingesting ~3000 xml files, storing ~50 xml files per db, which would be growing quickly. So, below approach would lead to ~3000 more files (which would be increasing), increasing I/O operations considerably for further pre-processing. However, I don't really care if process takes few minutes to few hours (as long as its not day(s) ;)). Given the situation and my options, I would surely try this. Database, is currently indexed at attribute level, as thats what I would be querying the most. Do you think, I should do anything differently ? Thanks, - Mansi On Thu, Nov 6, 2014 at 10:48 AM, Fabrice Etanchaud fetanch...@questel.com wrote: Hi Mansi, Here you have a natural partition of your data : the files you ingested. So my first suggestion would be to query your data on a file basis: for $doc in db:open(‘your_collection_name’) let $file-name := db:path($doc) return file:write( $file-name, names { for $name in $doc//E/@name/data() return name{$name}/name } /names ) Is it for indexing ? Hope it helps, Best regards, Fabrice Etanchaud Questel/Orbit *De :* basex-talk-boun...@mailman.uni-konstanz.de [mailto: basex-talk-boun...@mailman.uni-konstanz.de] *De la part de* Mansi Sheth *Envoyé :* jeudi 6 novembre 2014 16:33 *À :* Christian Grün *Cc :* BaseX *Objet :* Re: [basex-talk] Out Of Memory This would need a lot of details, so bear with me below: Briefly my XML files look like: A name= B name= C name= D name= E name=/ A can contain B, C or D and B, C or D can contain E. We have 1000s (currently 3000 in my test data set) of such xml files, of size 50MB on an average. Its tons of data ! Currently, my database is of ~18GB in size. Query: /A/*//E/@name/string() This query, was going OOM, within few mins. I tried a few ways, of whitelisting, with contain clause, to truncate the result set. That didn't help too. So, now I am out of ideas. This is giving JVM 10GB of dedicated memory. Once, above query works and doesn't go Out Of Memory, I also need corresponding file names too: XYZ.xml //E/@name PQR.xml //E/@name Let me know if you would need more details, to appreciate the issue ? - Mansi On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, I think we need more information on the queries that are causing the problems. Best, Christian On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.com wrote: Hello, I have a use case, where I have to extract lots in information from each XML in each DB. Something like, attribute values of most of the nodes in an XML. For such, queries based goes Out Of Memory with below exception. I am giving it ~12GB of RAM on i7 processor. Well I can't complain here since I am most definitely asking for loads of data, but is there any way I can get these kinds of data successfully ? mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp BaseX 8.0 beta b45c1e2 [Server] Server was started (port: 1984) HTTP Server was started (port: 8984) Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073) at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342) at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:744) -- - Mansi -- - Mansi -- - Mansi
Re: [basex-talk] Out Of Memory
I would be doing tons of post processing. I never use UI. I either use REST thru cURL or command line. I would basically need data in below format: XML File Name, @name I am trying to whitelist picking up values for only starts-with(@name,pqr). where pqr is a list of 150 odd values. My file names, are essentially some ID/keys, which I would need to map it further using sqlite to some values and may be group by it.. etc. So, basically I am trying to visualize some data, based on its existence in which xml files. So, yes count(query) would be fine, but won't solve much purpose, since I still need value pqr. - Mansi On Thu, Nov 6, 2014 at 11:19 AM, Christian Grün christian.gr...@gmail.com wrote: Query: /A/*//E/@name/string() In the GUI, all results will be cached, so you could think about switching to command line. Do you really need to output all results, or do you do some further processing with the intermediate results? For example, the query count(/A/*//E/@name/string()) will probably run without getting stuck. This query, was going OOM, within few mins. I tried a few ways, of whitelisting, with contain clause, to truncate the result set. That didn't help too. So, now I am out of ideas. This is giving JVM 10GB of dedicated memory. Once, above query works and doesn't go Out Of Memory, I also need corresponding file names too: XYZ.xml //E/@name PQR.xml //E/@name Let me know if you would need more details, to appreciate the issue ? - Mansi On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, I think we need more information on the queries that are causing the problems. Best, Christian On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.com wrote: Hello, I have a use case, where I have to extract lots in information from each XML in each DB. Something like, attribute values of most of the nodes in an XML. For such, queries based goes Out Of Memory with below exception. I am giving it ~12GB of RAM on i7 processor. Well I can't complain here since I am most definitely asking for loads of data, but is there any way I can get these kinds of data successfully ? mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp BaseX 8.0 beta b45c1e2 [Server] Server was started (port: 1984) HTTP Server was started (port: 8984) Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073) at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342) at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:744) -- - Mansi -- - Mansi -- - Mansi
Re: [basex-talk] Dynamic Evaluation of XQUERY
Thanks Christian, The second query below worked beautifully. I am trying to get db:path of the dynamic query. Code: declare variable $n as xs:string external; declare option output:item-separator #xa;; let $aPath := for $db in db:list() let $query := declare variable $db external; || db:open($db) || $n return xquery:eval($query, map { 'db': $db, 'query': $n }) for $elem in $aPath return db:path($elem) and am getting below exception, when called: mansi@work:BigData mansiadmin$ basex -b\$n='query' get_paths.xq Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 7.9 Java: Oracle Corporation, 1.7.0_45 OS: Mac OS X, x86_64 Stack Trace: java.lang.NullPointerException at org.basex.query.value.item.Str.get(Str.java:49) at org.basex.query.func.FNDb.path(FNDb.java:489) at org.basex.query.func.FNDb.item(FNDb.java:128) at org.basex.query.expr.ParseExpr.iter(ParseExpr.java:45) at org.basex.query.func.FNDb.iter(FNDb.java:92) at org.basex.query.gflwor.GFLWOR$2.next(GFLWOR.java:78) at org.basex.query.MainModule$1.next(MainModule.java:98) at org.basex.core.cmd.AQuery.query(AQuery.java:91) at org.basex.core.cmd.XQuery.run(XQuery.java:22) at org.basex.core.Command.run(Command.java:329) at org.basex.core.Command.execute(Command.java:94) at org.basex.server.LocalSession.execute(LocalSession.java:121) at org.basex.server.Session.execute(Session.java:37) at org.basex.core.CLI.execute(CLI.java:106) at org.basex.BaseX.init(BaseX.java:123) at org.basex.BaseX.main(BaseX.java:42) On Thu, Oct 30, 2014 at 5:54 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, you have been close! It could work with the following query (I haven't tried it out, though): _ get_query_result.xq declare variable $n external; declare option output:item-separator #xa;; let $aList := for $name in db:list() let $db := db:open($name) return xquery:eval($n, map { '': $db }) return distinct-values($aList) __ In this code, I'm opening the database in the main loop, and I then bind it to the empty string. This way, the database will be the context of the query to be evaluated query, and you won't have to deal with bugs that arise from the concatenation of db:open and the query string. 1. Can we assign dynamic values as a value to a map's key ? 2. Can I map have more than one key, in query:eval ? This is both possible. As you see in the following query, you'll again have to declare the variables that you want to bind. I agree this causes a lot of code, so we may simplify it again in a future version of BaseX: __ let $n := /a/b/c for $db in db:list() let $query := declare variable $db external; || db:open($db) || $n return xquery:eval($query, map { 'db': $db, 'query': $n }) __ Best, Christian -- - Mansi
[basex-talk] Dynamic Evaluation of XQUERY
Hello, I want to devise a generic xquery script, which accepts actual XPATH to be run across all documents in all databases, from command line. Something like: curl -i http://localhost:8984/rest?run=get_query_result.xqn=/root/*//calls/@name/string() Basically, parameter n would hold the query. I am trying this using xquery module eval method. But, am not succeeding in it. I have my script something as under: declare variable $n as xs:string external; declare option output:item-separator #xa;; let $aList := for $db in db:list() let $vars := map {'db_name':$db , 'query' : $n} let $query_to_execute := db:open($db_name)$query return xquery:eval($query_to_execute,$vars) return distinct-values($aList) My questions are: 1. Can we assign dynamic values as a value to a map's key ? 2. Can I map have more than one key, in query:eval ? Please point me in right direction, or explain what am I doing wrong in above code .. - Mansi
Re: [basex-talk] Architecture Question
Christian, Thanks for all your responses. It truly helps a lot. re: Importing data into databases: I realized, for the extent of this POC, I will just count no of docs in each database (currently programmed to be 50) and keep creating new databases. Structure of data is same, but its nested in nature. Like a folder can have folder, which can have file etc. Usually, it won't be more than 4 levels deep. Thats a good tip, to guess no of nodes based on byte size. I guess, for time being I will move on, with just storing 50 docs per DB. re: terabytes of data. Well, I am planning on using ~6 months worth of data for any analysis and discarding data prior to that (leaving it around in backups). Obviously, would be going some cloud route for such resources, will see how much budget I can manage to get :) Am very positive about this. So, no its not only a theoretical assumption as far as I can see. re: Currently, I am looking into querying these databases. I am exploring REST for it. From documentation, it seems our only option is supporting these queries (on server side) using XQUERY or RestXQ, no Java/Python ? I am well versed with XPATH and XSLT, gearing up towards XQUERY now. But, would be a little easier (just my personal preference :)) to manipulate data in Java/Python before serving it back to client. Is there any such facility ? Something like: http://localhost:8984/rest?run=getData.java; similarly for python ? - Mansi Some preliminary statistics: Imported 2050 XML documents in 22 min (including indexing on attributes). On Sun, Oct 19, 2014 at 6:14 PM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, Is there some book/resource you can point me to, which helps better visualize NXD ? sorry for letting you wait. If you want to know more about native XML databases, I recommend you to have a closer look at various articles in our Wiki (e. g. [1,2]). It will also be helpful if you get into the basics of XQuery [3]. Have you tried to realize some of the hints I gave in my previous mails? I am trying to distribute data across multiple databases. I can't distribute based on day, as there could very well be situation, where single day's data could more than capacity of BaseX DB. If 2 billion XML nodes per day are not enough, you will probably need to create more than one database per day. Via the info db command, you see how many nodes are currently stored in a database, but there is no cheap solution to find out the number of nodes of an incoming document, because XML documents can be very heterogeneous. Some questions back: * Do you have some more information on the data you want to store? * Are all documents similar or do they vary greatly? If the documents are somewhat similar, you can usually estimate the number of nodes by looking at the byte size. * Do you know that you will really need to store lots of terabytes of XML data, or it is more like a theoretical assumption? Christian [1] http://docs.basex.org/wiki/Database [2] http://docs.basex.org/wiki/Table_of_Contents [3] http://docs.basex.org/wiki/Xquery -- - Mansi
Re: [basex-talk] web application on a local installation ?
EM, Are you still facing that. I too installed using homebrew. started service using: basexhttp, and made an http request: http://localhost:8984/rest/db_name?query=query and everything worked fine. Is your database on local machine from where you started basexhttp ? I initially, had my database on external HDD, and service wasn't starting. Hope this helps, - Mansi On Thu, Oct 9, 2014 at 3:51 AM, Emmanuelle Morlock emmanuelle.morl...@mom.fr wrote: Hi sorry to ask a basic questions, but I'm a newbie that doesn't always understand all the pre-requisite in the documentation. If I can't find help here, just tell me. I tried to install a local instance of baseX and use the web application features. I work on mac and installed baseX with homebrew after changing the password of the admin user, and creating a database, I launched the httpserver and typed in my browser http://localhost:8984/ the result is : HTTP ERROR: 503 Problem accessing /webapp. Reason: Service Unavailable what am I missing ? is it even possible to use a web app on a local computer ? thanks in advance for your help... EM -- - Mansi
Re: [basex-talk] Architecture Question
I am trying to distribute data across multiple databases. I can't distribute based on day, as there could very well be situation, where single day's data could more than capacity of BaseX DB. From statistics page, only other way, which I can distribute is based on number of nodes. But going with that, I am not able to find a way, I can get hold of a way to access no of nodes programmatically in a db. Further, I am clueless, if I can even find no of nodes of current doc to be imported. So, currentDocToImport = a.xml ??NodeNo(a.xml) NumberOfNodes(LastDB) = ?? Do you guys agree if this is even a way to go ? Can someone give me pointers on how to find above 2 values ? Any other thoughts are always welcomed ... - Mansi On Tue, Oct 7, 2014 at 5:35 AM, Christian Grün christian.gr...@gmail.com wrote: Dear Mansi, 1. I have 1000s of XML files (each between 50MB-400MB) and this is going to grow exponentially (~200 / per day). So, my question is how scalable is BaseX ? Can I configure it to use data from my external HDD, in my initial prototype ? So this means you want to add appr. 40 gb of XML files per day, right, amounting to 14 tb/year? This sounds quite a lot indeed. You can have a look at our statistics page [1]; it gives you some insight into the current limits of BaseX. However, all limits are per single database. You can distribute your data in multiple databases and address multiple databases with a single XPath/XQuery request. For example, you could create a new database every day and run a query over all these databases: for $db in db:list() return db:open($db)/path/to/your/data 2. I plan to heavily use XPATH, for data retrieval. Does BaseX, use any multi-processing, multi-threading to speed up search ? Any concurrent processing ? Read-only requests will automatically be multithreaded. If a single query leads to heavy I/O requests, it may be that single threaded processing wlil give you better results (because hard drives are often not very good in reading data in parallel). 3. Can I do some post-processing on searched and retrieved data ? Like sorting, unique elements etc ? With XQuery (3.0), you can do virtually anything with your data. In most of our data-driven scenarios, all data processing is completely done in BaseX. Some plain examples can be found in our Wiki [2]. Hope this helps, Christian [1] http://docs.basex.org/wiki/Statistics [2] http://docs.basex.org/wiki/XQuery_3.0 -- - Mansi
Re: [basex-talk] Architecture Question
Christian, So, going ahead with my POC and use cases we plan to solve, I have a few more database architecture questions.. 1. Is there a way, we can have a table, with multiple columns. One of the column would be ID and others would be different XML information for that ID. 2. Can I map above table, with a relational table to perform join queries on ID. Thanks, - Mansi On Wed, Oct 8, 2014 at 12:53 PM, Christian Grün christian.gr...@gmail.com wrote: I just created a single Database with ~190 XML files of size 8.5 GB total. Activated indexes as well. Creating database using basexgui took close to an hour. Running a simple XQUERY took ~3 min. Database was created on an external USB 3.0 HDD. I will obviously be creating new databases across drives (if this POC is successful, will surely go for cloud) to scale it. For time being, any and all tips are welcomes to optimize performance. Indeed performance should be much better if databases are created and queried on HDs or SSDs. Feel free to send us your queries if execution time is not good enough. May be I will soon contribute to the statistics pages :) Thanks, Christian -- - Mansi
Re: [basex-talk] Architecture Question
On Fri, Oct 10, 2014 at 10:31 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Mansi, Out of interest: why don't you simply store all documents in the database and use the document path as ID? I am storing deeply nested hierarchal data in XML files. Simply put, most of my queries are going to be relative (For e..g //@name). So, I am assuming it would be a huge performance hit. Specially, when I know each ID will most definitely have multiple XML documents. Correct me, if I am wrong here. as BaseX is a native XML store, there is no way to store data in structures like tables. However, due to the flexibility of XML structures, the usual way is to create another document or database that contains ID and additional meta data. I don't know, if I follow you completely here. Is there some metadata information which I can use, which maps each XML file stored in NXD to another relational database you discussed above, which I can use for mapping ? Best, Christian -- - Mansi
Re: [basex-talk] Architecture Question
Thanks Christian. re: size of data, I am hoping some days would be quieter than discussed below. But, yes its going to be a lot of data. I just created a single Database with ~190 XML files of size 8.5 GB total. Activated indexes as well. Creating database using basexgui took close to an hour. Running a simple XQUERY took ~3 min. Database was created on an external USB 3.0 HDD. I will obviously be creating new databases across drives (if this POC is successful, will surely go for cloud) to scale it. For time being, any and all tips are welcomes to optimize performance. May be I will soon contribute to the statistics pages :) - Mansi On Tue, Oct 7, 2014 at 5:35 AM, Christian Grün christian.gr...@gmail.com wrote: Dear Mansi, 1. I have 1000s of XML files (each between 50MB-400MB) and this is going to grow exponentially (~200 / per day). So, my question is how scalable is BaseX ? Can I configure it to use data from my external HDD, in my initial prototype ? So this means you want to add appr. 40 gb of XML files per day, right, amounting to 14 tb/year? This sounds quite a lot indeed. You can have a look at our statistics page [1]; it gives you some insight into the current limits of BaseX. However, all limits are per single database. You can distribute your data in multiple databases and address multiple databases with a single XPath/XQuery request. For example, you could create a new database every day and run a query over all these databases: for $db in db:list() return db:open($db)/path/to/your/data 2. I plan to heavily use XPATH, for data retrieval. Does BaseX, use any multi-processing, multi-threading to speed up search ? Any concurrent processing ? Read-only requests will automatically be multithreaded. If a single query leads to heavy I/O requests, it may be that single threaded processing wlil give you better results (because hard drives are often not very good in reading data in parallel). 3. Can I do some post-processing on searched and retrieved data ? Like sorting, unique elements etc ? With XQuery (3.0), you can do virtually anything with your data. In most of our data-driven scenarios, all data processing is completely done in BaseX. Some plain examples can be found in our Wiki [2]. Hope this helps, Christian [1] http://docs.basex.org/wiki/Statistics [2] http://docs.basex.org/wiki/XQuery_3.0 -- - Mansi
[basex-talk] Architecture Question
Hello, I have been going thru and comparing different Native XML Databases and so far I am liking BaseX. However, there are still a few questions unanswered, before I make a final choice: 1. I have 1000s of XML files (each between 50MB-400MB) and this is going to grow exponentially (~200 / per day). So, my question is how scalable is BaseX ? Can I configure it to use data from my external HDD, in my initial prototype ? 2. I plan to heavily use XPATH, for data retrieval. Does BaseX, use any multi-processing, multi-threading to speed up search ? Any concurrent processing ? 3. Can I do some post-processing on searched and retrieved data ? Like sorting, unique elements etc ? - Mansi