Re: [basex-talk] Runtime Exception

2016-05-03 Thread Mansi Sheth
Upgraded to latest basex version. Still getting such runtime errors:

HTTP/1.1 400 Bad Request
Content-Type: text/plain;charset=UTF-8^M
Content-Length: 3222
Server: Jetty(8.1.18.v20150929)

Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 8.4.4
Java: Oracle Corporation, 1.7.0_95
OS: Linux, amd64
Stack Trace:
java.lang.NullPointerException
at org.basex.data.DiskData.write(DiskData.java:120)
at org.basex.data.DiskData.close(DiskData.java:140)
at org.basex.core.Datas.unpin(Datas.java:53)
at org.basex.core.cmd.Close.close(Close.java:45)
at org.basex.query.QueryResources.close(QueryResources.java:108)
at org.basex.query.QueryContext.close(QueryContext.java:603)
at org.basex.query.QueryProcessor.close(QueryProcessor.java:262)
at org.basex.core.cmd.AQuery.query(AQuery.java:99)
at org.basex.core.cmd.XQuery.run(XQuery.java:22)
at org.basex.core.Command.run(Command.java:398)
at org.basex.http.rest.RESTCmd.run(RESTCmd.java:99)
at org.basex.http.rest.RESTQuery.query(RESTQuery.java:74)
at org.basex.http.rest.RESTRun.run0(RESTRun.java:41)
at org.basex.http.rest.RESTCmd.run(RESTCmd.java:65)
at org.basex.core.Command.run(Command.java:398)
at org.basex.core.Command.execute(Command.java:100)
at org.basex.core.Command.execute(Command.java:123)
at org.basex.http.rest.RESTServlet.run(RESTServlet.java:22)
at org.basex.http.BaseXServlet.service(BaseXServlet.java:64)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:231)
at
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)


On Fri, Apr 29, 2016 at 1:49 PM, Christian Grün <christian.gr...@gmail.com>
wrote:

> Hi Mansi,
>
> This error shouldn’t show up anymore with more recent versions of
> BaseX. Could you try the latest version?
>
> Regarding the new error, could you please tell us more about what you
> were doing with your data? Did you read and write at the same time?
> Did you use different BaseX instances to access the data in parallel?
>
> Thanks
> Christian
>
>
> On Fri, Apr 29, 2016 at 6:28 PM, Mansi Sheth <mansi.sh...@gmail.com>
> wrote:
> > Hello,
> >
> > So, now I am stuck. I am not even able to access any database:
> >
> > ubuntu@ip-10-0-0-83:~$ basex
> > BaseX 8.2.3 [Standalone]
> > Try help to get more information.
> >
> >> list
> > Improper use? Potential bug? Your feedback is welcome:
> > Contact: basex-talk@mailman.uni-konstanz.de
> > Version: BaseX 8.2.3
> > Java: Oracle Corporation, 1.7.0_95
> > OS: Linux, amd64
> > Stack Trace:
> > java.lang.ArrayIndexOutOfBoundsException: 0
> > at org.basex.util.Version.(Version.java:33)
> > at org.basex.util.Version.(Version.java:24)
> > at org.basex.dat

Re: [basex-talk] Runtime Exception

2016-04-29 Thread Mansi Sheth
Hello,

So, now I am stuck. I am not even able to access any database:

ubuntu@ip-10-0-0-83:~$ basex
BaseX 8.2.3 [Standalone]
Try help to get more information.

> list
Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 8.2.3
Java: Oracle Corporation, 1.7.0_95
OS: Linux, amd64
Stack Trace:
java.lang.ArrayIndexOutOfBoundsException: 0
at org.basex.util.Version.(Version.java:33)
at org.basex.util.Version.(Version.java:24)
at org.basex.data.MetaData.read(MetaData.java:315)
at org.basex.data.MetaData.read(MetaData.java:262)
at org.basex.core.cmd.List.list(List.java:83)
at org.basex.core.cmd.List.run(List.java:52)
at org.basex.core.Command.run(Command.java:398)
at org.basex.core.Command.execute(Command.java:100)
at org.basex.api.client.LocalSession.execute(LocalSession.java:132)
at org.basex.api.client.Session.execute(Session.java:36)
at org.basex.core.CLI.execute(CLI.java:103)
at org.basex.core.CLI.execute(CLI.java:87)
at org.basex.BaseX.console(BaseX.java:191)
at org.basex.BaseX.(BaseX.java:166)
at org.basex.BaseX.main(BaseX.java:42)

>


On Tue, Apr 26, 2016 at 12:14 PM, Christian Grün <christian.gr...@gmail.com>
wrote:

> Hi Mansi,
>
> Thanks for the feedback. Errors like this sometime occur if databases
> are requested from different JVMs at the same time. See e.g. [1] for
> more information.
>
> Cheers
> Christian
>
> [1] http://docs.basex.org/wiki/Startup#Concurrent_Operations
>
>
> On Tue, Apr 26, 2016 at 6:02 PM, Mansi Sheth <mansi.sh...@gmail.com>
> wrote:
> > I did try the inspect command on all databases, which should there are no
> > inconsistencies.I was logging any exceptions, in my java code, in case of
> > errors, and that showed me, which database in particular was in problem,
> > dropping it helped.
> >
> > This was the first time I saw it. I was worried, DB has grown till the
> point
> > of not being supported, when I actually panicked.
> >
> > Thanks,
> > - Mansi
> >
> > On Tue, Apr 26, 2016 at 3:27 AM, Christian Grün <
> christian.gr...@gmail.com>
> > wrote:
> >>
> >> Dear Mansi,
> >>
> >> you could try to run the INSPECT command on the affected database, or
> all
> >> databases, in order to find out if your database has gone corrupt. Did
> you
> >> repeatedly come across this error?
> >>
> >> Best,
> >> Christian
> >>
> >> Am 25.04.2016 16:45 schrieb "Mansi Sheth" <mansi.sh...@gmail.com>:
> >> >
> >> > Hello,
> >> >
> >> > My current BaseXDB is at 920GB, with ~230 databases... I run jetty
> >> > server visa basexhttp script with giving it explicit 30GB of RAM.
> While
> >> > trying to access a query, thru REST api via XQUERY, I get below error.
> >> >
> >> > HTTP/1.1 400 Bad Request^M
> >> > Content-Type: text/plain;charset=UTF-8^M
> >> > Content-Length: 4207^M
> >> > Server: Jetty(8.1.16.v20140903)^M
> >> > ^M
> >> > Improper use? Potential bug? Your feedback is welcome:
> >> > Contact: basex-talk@mailman.uni-konstanz.de
> >> > Version: BaseX 8.2.3
> >> > Java: Oracle Corporation, 1.7.0_95
> >> > OS: Linux, amd64
> >> > Stack Trace:
> >> > java.lang.RuntimeException: Data Access out of bounds:
> >> > - pre value: 126882320
> >> > - #used blocks: 495643
> >> > - #total locks: 495643
> >> > - access: 495642 (495643 > 495642]
> >> > at org.basex.util.Util.notExpected(Util.java:60)
> >> > at
> >> > org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:458)
> >> > at
> >> > org.basex.io.random.TableDiskAccess.read1(TableDiskAccess.java:148)
> >> > at org.basex.data.Data.kind(Data.java:306)
> >> > at org.basex.query.value.node.DBNode.(DBNode.java:51)
> >> > at
> org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:68)
> >> > at
> org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:22)
> >> > at org.basex.query.value.seq.Seq$1.next(Seq.java:77)
> >> > at org.basex.query.expr.path.IterPath$1.next(IterPath.java:58)
> >> > at org.basex.query.expr.path.IterPath$1.next(IterPath.java:36)
> >> > at org.basex.query.MainModule$1.next(MainModule.java:114)
> >> > at
> >> > org.basex.query.func.StandardFunc.cache(StandardFunc.java:384)
> >> > at
> 

Re: [basex-talk] Runtime Exception

2016-04-26 Thread Mansi Sheth
I did try the inspect command on all databases, which should there are no
inconsistencies.I was logging any exceptions, in my java code, in case of
errors, and that showed me, which database in particular was in problem,
dropping it helped.

This was the first time I saw it. I was worried, DB has grown till the
point of not being supported, when I actually panicked.

Thanks,
- Mansi

On Tue, Apr 26, 2016 at 3:27 AM, Christian Grün <christian.gr...@gmail.com>
wrote:

> Dear Mansi,
>
> you could try to run the INSPECT command on the affected database, or all
> databases, in order to find out if your database has gone corrupt. Did you
> repeatedly come across this error?
>
> Best,
> Christian
>
> Am 25.04.2016 16:45 schrieb "Mansi Sheth" <mansi.sh...@gmail.com>:
> >
> > Hello,
> >
> > My current BaseXDB is at 920GB, with ~230 databases... I run jetty
> server visa basexhttp script with giving it explicit 30GB of RAM. While
> trying to access a query, thru REST api via XQUERY, I get below error.
> >
> > HTTP/1.1 400 Bad Request^M
> > Content-Type: text/plain;charset=UTF-8^M
> > Content-Length: 4207^M
> > Server: Jetty(8.1.16.v20140903)^M
> > ^M
> > Improper use? Potential bug? Your feedback is welcome:
> > Contact: basex-talk@mailman.uni-konstanz.de
> > Version: BaseX 8.2.3
> > Java: Oracle Corporation, 1.7.0_95
> > OS: Linux, amd64
> > Stack Trace:
> > java.lang.RuntimeException: Data Access out of bounds:
> > - pre value: 126882320
> > - #used blocks: 495643
> > - #total locks: 495643
> > - access: 495642 (495643 > 495642]
> > at org.basex.util.Util.notExpected(Util.java:60)
> > at
> org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:458)
> > at
> org.basex.io.random.TableDiskAccess.read1(TableDiskAccess.java:148)
> > at org.basex.data.Data.kind(Data.java:306)
> > at org.basex.query.value.node.DBNode.(DBNode.java:51)
> > at org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:68)
> > at org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:22)
> > at org.basex.query.value.seq.Seq$1.next(Seq.java:77)
> > at org.basex.query.expr.path.IterPath$1.next(IterPath.java:58)
> > at org.basex.query.expr.path.IterPath$1.next(IterPath.java:36)
> > at org.basex.query.MainModule$1.next(MainModule.java:114)
> > at org.basex.query.func.StandardFunc.cache(StandardFunc.java:384)
> > at
> org.basex.query.func.xquery.XQueryEval.eval(XQueryEval.java:129)
> > at
> org.basex.query.func.xquery.XQueryEval.eval(XQueryEval.java:59)
> > at
> org.basex.query.func.xquery.XQueryEval.value(XQueryEval.java:49)
> > at org.basex.query.expr.gflwor.GFLWOR.value(GFLWOR.java:77)
> > at org.basex.query.QueryContext.value(QueryContext.java:421)
> > at org.basex.query.expr.gflwor.Let$LetEval.next(Let.java:187)
> > at org.basex.query.expr.gflwor.GFLWOR$1.next(GFLWOR.java:95)
> > at org.basex.query.MainModule$1.next(MainModule.java:114)
> > at org.basex.core.cmd.AQuery.query(AQuery.java:91)
> > at org.basex.core.cmd.XQuery.run(XQuery.java:22)
> > at org.basex.core.Command.run(Command.java:398)
> > at org.basex.http.rest.RESTCmd.run(RESTCmd.java:99)
> > at org.basex.http.rest.RESTQuery.query(RESTQuery.java:74)
> > at org.basex.http.rest.RESTRun.run0(RESTRun.java:41)
> > at org.basex.http.rest.RESTCmd.run(RESTCmd.java:65)
> > at org.basex.core.Command.run(Command.java:398)
> > at org.basex.core.Command.execute(Command.java:100)
> > at org.basex.core.Command.execute(Command.java:123)
> > at org.basex.http.rest.RESTServlet.run(RESTServlet.java:22)
> > at org.basex.http.BaseXServlet.service(BaseXServlet.java:64)
> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
> > at
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
> > at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
> > at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> > at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> > at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> > at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
> > at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:

[basex-talk] Runtime Exception

2016-04-25 Thread Mansi Sheth
Hello,

My current BaseXDB is at 920GB, with ~230 databases... I run jetty server
visa basexhttp script with giving it explicit 30GB of RAM. While trying to
access a query, thru REST api via XQUERY, I get below error.

HTTP/1.1 400 Bad Request^M
Content-Type: text/plain;charset=UTF-8^M
Content-Length: 4207^M
Server: Jetty(8.1.16.v20140903)^M
^M
Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 8.2.3
Java: Oracle Corporation, 1.7.0_95
OS: Linux, amd64
Stack Trace:
java.lang.RuntimeException: *Data Access out of bounds:*
*- pre value: 126882320*
*- #used blocks: 495643*
*- #total locks: 495643*
*- access: 495642 (495643 > 495642]*
at org.basex.util.Util.notExpected(Util.java:60)
at
org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:458)
at
org.basex.io.random.TableDiskAccess.read1(TableDiskAccess.java:148)
at org.basex.data.Data.kind(Data.java:306)
at org.basex.query.value.node.DBNode.(DBNode.java:51)
at org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:68)
at org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:22)
at org.basex.query.value.seq.Seq$1.next(Seq.java:77)
at org.basex.query.expr.path.IterPath$1.next(IterPath.java:58)
at org.basex.query.expr.path.IterPath$1.next(IterPath.java:36)
at org.basex.query.MainModule$1.next(MainModule.java:114)
at org.basex.query.func.StandardFunc.cache(StandardFunc.java:384)
at org.basex.query.func.xquery.XQueryEval.eval(XQueryEval.java:129)
at org.basex.query.func.xquery.XQueryEval.eval(XQueryEval.java:59)
at org.basex.query.func.xquery.XQueryEval.value(XQueryEval.java:49)
at org.basex.query.expr.gflwor.GFLWOR.value(GFLWOR.java:77)
at org.basex.query.QueryContext.value(QueryContext.java:421)
at org.basex.query.expr.gflwor.Let$LetEval.next(Let.java:187)
at org.basex.query.expr.gflwor.GFLWOR$1.next(GFLWOR.java:95)
at org.basex.query.MainModule$1.next(MainModule.java:114)
at org.basex.core.cmd.AQuery.query(AQuery.java:91)
at org.basex.core.cmd.XQuery.run(XQuery.java:22)
at org.basex.core.Command.run(Command.java:398)
at org.basex.http.rest.RESTCmd.run(RESTCmd.java:99)
at org.basex.http.rest.RESTQuery.query(RESTQuery.java:74)
at org.basex.http.rest.RESTRun.run0(RESTRun.java:41)
at org.basex.http.rest.RESTCmd.run(RESTCmd.java:65)
at org.basex.core.Command.run(Command.java:398)
at org.basex.core.Command.execute(Command.java:100)
at org.basex.core.Command.execute(Command.java:123)
at org.basex.http.rest.RESTServlet.run(RESTServlet.java:22)
at org.basex.http.BaseXServlet.service(BaseXServlet.java:64)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)


-- 
- Mansi


[basex-talk] XQERY Help

2016-01-13 Thread Mansi Sheth
Hello,

I need help with hopefully a simple XQUERY.

I want to extend below XQUERY to run not on all documents in all databases,
but only on those documents which contains "input string X" in its db:path
(i.e. its original file name).

Any help appreciated.

declare variable $n as xs:string external; (: command line query, to be
entered as "n" variable :)
declare option output:item-separator ""; (: Each element would be in
new line :)

(: Run input query, on every XML document in every Database:)
let $queryData :=
for $db in db:list()
(: Assign dynamic variables to generate query, to be used in eval :)
let $query := "declare variable $db external; " || "db:open($db)" || $n
return xquery:eval($query,map { 'db': $db, 'query': $n })

return distinct-values($queryData)

- Mansi


Re: [basex-talk] Guidance on Indexing

2016-01-03 Thread Mansi Sheth
Thanks Christian as always was a quick and detailed response.

1. I am not 100% clear, if you are motivating me towards or against
FULLTEXT indexing :)

2. Yes I am dealing with GBs of XML files. I create new Databases, using
JAVA API using CreateDB class. Should I be using MainOptions to set
AUTOOPTIMIZE and UPDINDEX options before each new db creation ? In
MainOptions class, I didn't find any auto optimize option, am I missing
something ? Since, I am anyways setting options thru this method, should I
also set FTINDEX or ATTRINDEX (based on your response 1) attribute as well,
before creating each DB ? I would hate to run optimization script after
each DB update (updates happens daily).

Please advice,
- Mansi

On Sun, Jan 3, 2016 at 4:52 PM, Christian Grün 
wrote:

> Hi Mansi,
>
> > 1. Most of my xqueries are of below nature
> >
> > '/Archives/descendant::apiCalls[contains(@name,"com.sun")]/@name', where
> > apiCalls could be 3-4 level under 'Archives'. Xqueries are accessed via
> REST
>
> The existing index structures won’t allow you to look for arbitrary
> sub strings; see [1] for more information.
>
> You are right, the full-text index may be a possibly way out. Prefix
> searches can be realized via the "using wildcards" option [2]:
>
>   //*[text() contains text "abc.*" using wildcards
>
> Please note that the query string will always be "tokenized": if you
> are looking for "com.sun", you will also get results like "COM SUN!".
>
> > 2. I have 1000s of documents, spanning over 100 XML DB, with total space
> > around 400 GB currently. Each query is taking roughly 30 mins, to run.
> >
> > My concern is, at each DB update, I am using attribute indexing, but info
> > command on basex prompt tells me otherwise. Am I misreading something ?
> Is
> > there a way to fix this once DB is created ? Its takes me 48 hours, to
> > create DBs from scratch... :)
>
> If UPDINDEX and AUTOOPTIMIZE is false, you will need to call
> "OPTIMIZE" after your updates.
>
> If you create a new database, you can set UPDINDEX and AUTOOPTIMIZE to
> true. However, AUTOOPTIMIZE will get incredibly slow if you are
> working with gigabytes of XML data.
>
> > Reading thru UPDINDEX and AUTOOPTIMIZE ALL commands, tells me to open
> each
> > DB and run these commands. Is that my option ? Do we have a xquery script
> > somewhere which I can use to do this ?
>
> If your databases are called "db1" ... "db100", the following XQuery
> script will optimize all those databases:
>
>   for $i in 1 to 100
>   return db:optimize('db' || $i)
>
> You can also create a command script [3] with XQuery:
>
>   {
> for $i in 1 to 100
> return (
>   { 'db' || $i },
>   
> )
>   }
>
> You can store the result as a .bxs file and run it afterwards.
>
> Before you create all index structures, you should probably run your
> queries on some smaller database instances and check out the "Query
> Info" panel in the GUI. It will tell you if an index is used or not.
>
> Best,
> Christian
>
> [1] http://docs.basex.org/wiki/Indexes#Value_Indexes
> [2] http://docs.basex.org/wiki/Full-Text#Match_Options
> [3] http://docs.basex.org/wiki/Commands#Command_Scripts
>



-- 
- Mansi


[basex-talk] Guidance on Indexing

2016-01-03 Thread Mansi Sheth
Hello,

A very happy new year to all of you !!!

I have some very basic questions with indexing.

1. Most of my xqueries are of below nature

'/Archives/descendant::apiCalls[contains(@name,"com.sun")]/@name', where
apiCalls could be 3-4 level under 'Archives'. Xqueries are accessed via REST

Based on this, I used attribute indexing, after each update to DB. Am I
correct ? Should I have been using fulltext indexing instead ?  Why ?

2. I have 1000s of documents, spanning over 100 XML DB, with total space
around 400 GB currently. Each query is taking roughly 30 mins, to run.
Though expectable performance, but I know I can do better with indexing.
Currently, when I looked at one of the DBs,

> open bi_output_3
Database 'bi_output_3' was opened in 38.22 ms.
> info db
Database Properties
 Name: bi_output_3
 Size: 3938 MB
 Nodes: 16193129
 Documents: 35
 Binaries: 0
 Timestamp: 2016-01-03T13:40:40.000Z

Resource Properties
 Timestamp: 2016-01-03T13:40:40.776Z
 Encoding: UTF-8
 CHOP: true

Indexes
 Up-to-date: false
 TEXTINDEX: false
 ATTRINDEX: false
 FTINDEX: false
 LANGUAGE: English
 STEMMING: false
 CASESENS: false
 DIACRITICS: false
 STOPWORDS:
 UPDINDEX: false
 AUTOOPTIMIZE: false
 MAXCATS: 100
 MAXLEN: 96

When looked at its HDD footprint:

ubuntu@/BaseXDB/bi_output_3$ ls -l
total 4032992
-rw-rw-r-- 1 ubuntu ubuntu 2209449064 Jan  1 17:00 atv.basex
-rw-rw-r-- 1 ubuntu ubuntu  4 Jan  1 16:35 atvl.basex
-rw-rw-r-- 1 ubuntu ubuntu  0 Jan  1 16:35 atvr.basex
-rw-rw-r-- 1 ubuntu ubuntu   6414 Jan  3 13:40 doc.basex
-rw-rw-r-- 1 ubuntu ubuntu  6 Jan  1 17:00 ftxx.basex
-rw-rw-r-- 1 ubuntu ubuntu  0 Jan  1 17:00 ftxy.basex
-rw-rw-r-- 1 ubuntu ubuntu  0 Jan  1 17:00 ftxz.basex
-rw-rw-r-- 1 ubuntu ubuntu829 Jan  3 13:40 inf.basex
-rw-rw-r-- 1 ubuntu ubuntu 28 Jan  1 17:00 swl.basex
-rw-rw-r-- 1 ubuntu ubuntu 1916444672 Jan  3 13:40 tbl.basex
-rw-rw-r-- 1 ubuntu ubuntu3796037 Jan  3 13:40 tbli.basex
-rw-rw-r-- 1 ubuntu ubuntu  45462 Jan  1 17:00 txt.basex
-rw-rw-r-- 1 ubuntu ubuntu  4 Jan  1 16:35 txtl.basex
-rw-rw-r-- 1 ubuntu ubuntu  0 Jan  1 16:35 txtr.basex
ubuntu@/BaseXDB/bi_output_3$ pwd
/veracode/msheth/BaseXDB/bi_output_3
ubuntu@/BaseXDB/bi_output_3$

My concern is, at each DB update, I am using attribute indexing, but info
command on basex prompt tells me otherwise. Am I misreading something ? Is
there a way to fix this once DB is created ? Its takes me 48 hours, to
create DBs from scratch... :)

Reading thru UPDINDEX and AUTOOPTIMIZE ALL commands, tells me to open each
DB and run these commands. Is that my option ? Do we have a xquery script
somewhere which I can use to do this ?

Thanks,
- Mansi


Re: [basex-talk] [bxerr:BXDB0002] Too many open files

2015-12-07 Thread Mansi Sheth
Thanks Joe for your input. I haven't tried all the options yet, but will
surely go thru.

I guess, what I was trying to see is, if there is a way I can optimize my
XQUERIES, to close open databases which it not longer needs. Currently, my
queries are something of below nature. I am thinking, if there is a better
way to deal with *bold* piece of code below.

declare variable $n as xs:string external;
declare option output:item-separator "";


let $queryData :=
for $db in db:list()

*let $query := "declare variable $db external; " || "db:open($db)" ||
$n*
*return xquery:eval($query,map { 'db': $db, 'query': $n })*

return distinct-values($queryData)





On Mon, Dec 7, 2015 at 1:30 PM, Joe Wicentowski <joe...@gmail.com> wrote:

> Hi Mansi,
>
> The results of ulimit can be misleading.  See this article - which really
> helped me when I encountered this issue (though not with BaseX):
>
>
> https://underyx.me/2015/05/18/raising-the-maximum-number-of-file-descriptors.html
>
> Joe
>
> On Mon, Dec 7, 2015 at 1:22 PM, Mansi Sheth <mansi.sh...@gmail.com> wrote:
>
>> Thanks Christian,
>>
>> I had already set open files limit on the OS:
>>
>> ubuntu@ip-10-0-0-83:~$ ulimit -Hn
>>
>> 
>>
>> However, I still face exact same problem. Process breaks at the same db
>> count
>>
>> [bxerr:BXDB0002] Resource
>> "/veracode/msheth/BaseXDB/bi_output_715/inf.basex (Too many open files)"
>> not found.
>>
>> On Sat, Dec 5, 2015 at 7:53 AM, Christian Grün <christian.gr...@gmail.com
>> > wrote:
>>
>>> Hi Mansi,
>>>
>>> If you are working with Linux, you may need to increase the maximum
>>> file limit with "ulimit -n" [1].
>>>
>>> Hope this helps,
>>> Christian
>>>
>>> [1] http://www.linuxhowtos.org/Tips%20and%20Tricks/ulimit.htm
>>>
>>>
>>>
>>> On Fri, Dec 4, 2015 at 7:52 PM, Mansi Sheth <mansi.sh...@gmail.com>
>>> wrote:
>>> > Hello,
>>> >
>>> > I am importing BaseX, with tons of XML files. Currently I have roughly
>>> 1600
>>> > databases, I am starting basexhttp service, to access it over a web
>>> service
>>> > endpoint, thru a xquery file. Using BaseX 8.2.3.
>>> >
>>> > I am receiving below error:
>>> >
>>> > [bxerr:BXDB0002] Resource
>>> "/veracode/msheth/BaseXDB/bi_output_713/inf.basex
>>> > (Too many open files)" not found.
>>> >
>>> > basexhttp, is running with 10240M virtual memory.
>>> >
>>> > I can share the xquery file, if thats needed.
>>> >
>>> > Has anyone experienced this before ? Is there a limit on no of
>>> databases
>>> > supported by BaseX ? Is there some configuration option, which I can
>>> use to
>>> > close already queried database ?
>>> >
>>> > Thanks,
>>> > - Mansi
>>>
>>
>>
>>
>> --
>> - Mansi
>>
>
>


-- 
- Mansi


Re: [basex-talk] [bxerr:BXDB0002] Too many open files

2015-12-07 Thread Mansi Sheth
Thanks Christian,

I had already set open files limit on the OS:

ubuntu@ip-10-0-0-83:~$ ulimit -Hn



However, I still face exact same problem. Process breaks at the same db
count

[bxerr:BXDB0002] Resource "/veracode/msheth/BaseXDB/bi_output_715/inf.basex
(Too many open files)" not found.

On Sat, Dec 5, 2015 at 7:53 AM, Christian Grün <christian.gr...@gmail.com>
wrote:

> Hi Mansi,
>
> If you are working with Linux, you may need to increase the maximum
> file limit with "ulimit -n" [1].
>
> Hope this helps,
> Christian
>
> [1] http://www.linuxhowtos.org/Tips%20and%20Tricks/ulimit.htm
>
>
>
> On Fri, Dec 4, 2015 at 7:52 PM, Mansi Sheth <mansi.sh...@gmail.com> wrote:
> > Hello,
> >
> > I am importing BaseX, with tons of XML files. Currently I have roughly
> 1600
> > databases, I am starting basexhttp service, to access it over a web
> service
> > endpoint, thru a xquery file. Using BaseX 8.2.3.
> >
> > I am receiving below error:
> >
> > [bxerr:BXDB0002] Resource
> "/veracode/msheth/BaseXDB/bi_output_713/inf.basex
> > (Too many open files)" not found.
> >
> > basexhttp, is running with 10240M virtual memory.
> >
> > I can share the xquery file, if thats needed.
> >
> > Has anyone experienced this before ? Is there a limit on no of databases
> > supported by BaseX ? Is there some configuration option, which I can use
> to
> > close already queried database ?
> >
> > Thanks,
> > - Mansi
>



-- 
- Mansi


[basex-talk] [bxerr:BXDB0002] Too many open files

2015-12-04 Thread Mansi Sheth
Hello,

I am importing BaseX, with tons of XML files. Currently I have roughly 1600
databases, I am starting basexhttp service, to access it over a web service
endpoint, thru a xquery file. Using BaseX 8.2.3.

I am receiving below error:

[bxerr:BXDB0002] Resource "/veracode/msheth/BaseXDB/bi_output_713/inf.basex
(Too many open files)" not found.

basexhttp, is running with 10240M virtual memory.

I can share the xquery file, if thats needed.

Has anyone experienced this before ? Is there a limit on no of databases
supported by BaseX ? Is there some configuration option, which I can use to
close already queried database ?

Thanks,
- Mansi


[basex-talk] Basex 8.2.3 data not shown, until OS restart

2015-09-01 Thread Mansi Sheth
Hello,

This is something very weird. I import bunch of XML files, into this latest
version of based using Java API. When trying to access this data (via basex
command line client or REST), doesn't show any databases at all. These
databases shows up only after restarting OS.

Is this a known issues ? or just me ?

- Mansi


Re: [basex-talk] Finding document based on filename

2015-09-01 Thread Mansi Sheth
Thanks guys for all expert comments. Currently, I am going experimenting
performance with just deleting and inserting using Java API. If this
process takes a tiny bit longer, i don't really care is what I figured :)
If i becomes unacceptable, I will use one of these suggestions.

Thanks once again.

StringList databases =  List.list(context) ;

String query = "" ;

for(String database : databases ) {

query = "db:list('" + database + "')" ;



try {

for (String fileName: query(query).split(" ")) {

query = "db:delete('" +  database + "','" + fileName + "')" ;

if(fileName.contains(XMLFileName.split("_")[1])) {

query(query) ;

logger.info("Deleted " + fileName + " from " + database) ;

retVal = true;

break;

}

}

} catch (BaseXException e) {

e.printStackTrace();

}

}

On Mon, Aug 31, 2015 at 9:45 PM, Martín Ferrari 
wrote:

> I forgot one thing, I got much better performance by just calling
> replace rather than delete and insert, but this is a db with more than one
> million records. If performance is not important, I believe either way will
> do.
>
> Martín.
>
> --
> From: ferrari_mar...@hotmail.com
> To: mansi.sh...@gmail.com; basex-talk@mailman.uni-konstanz.de
> Date: Mon, 31 Aug 2015 16:35:33 +
> Subject: Re: [basex-talk] Finding document based on filename
>
>
> Hi Mansi,
>  I have a similar situation. I don't think there's a fast way to get
> documents by only knowing a part of their names. It seems you need to know
> the exact name. In my case, we might be able to group documents by a common
> id, so we might create subfolders inside the DB and store/get the contents
> of the subfolder directly, which is pretty fast.
>  I've also tried indexing, but insertions got really slow (I assume
> maybe because indexing is not granular, it indexes all values) and we
> need performance.
>
>  Oh, I've also tried using starts-with() instead of contains(), but it
> seems it does not pick up indexes.
>
> Martín.
>
> --
> Date: Fri, 28 Aug 2015 16:52:37 -0400
> From: mansi.sh...@gmail.com
> To: basex-talk@mailman.uni-konstanz.de
> Subject: [basex-talk] Finding document based on filename
>
> Hello,
>
> I would be having 100s of databases, with each database having 100 XML
> documents. I want to devise an algorithm, where given a part of XML file
> name, i want to know which database(s) contains it, or null if document is
> not currently present in any database. Based on that, add current document
> into the database. This is to always maintain latest version of a document
> in DB, and remove the older version, while adding newer version.
>
> So far, only way I could come up with is:
>
> for $db in all-databases:
>   open $db
>   $fileNames = list $db
> for eachFileName in $fileNames:
>if $eachFileName.contains(sub-xml filename):
> add to ret-list-db
>
> return ret-list-db
>
> Above algorithm, seems highly inefficient, Is there any indexing, which
> can be done ? Do you suggest, for each document insert, I should maintain a
> separate XML document, which lists each file inserted etc.
>
> Once, i get hold of above list of db, I would be eventually deleting that
> file and inserting a latest version of that file(which would have same
> sub-xml file name). So, constant updating of this external document also
> seems painful (Map be ?).
>
> Also, would it be faster, using XQUERY script files, thru java code, or
> using Java API for such operations ?
>
> How do you all deal with such operations ?
>
> - Mansi
>



-- 
- Mansi


[basex-talk] Finding document based on filename

2015-08-28 Thread Mansi Sheth
Hello,

I would be having 100s of databases, with each database having 100 XML
documents. I want to devise an algorithm, where given a part of XML file
name, i want to know which database(s) contains it, or null if document is
not currently present in any database. Based on that, add current document
into the database. This is to always maintain latest version of a document
in DB, and remove the older version, while adding newer version.

So far, only way I could come up with is:

for $db in all-databases:
  open $db
  $fileNames = list $db
for eachFileName in $fileNames:
   if $eachFileName.contains(sub-xml filename):
add to ret-list-db

return ret-list-db

Above algorithm, seems highly inefficient, Is there any indexing, which can
be done ? Do you suggest, for each document insert, I should maintain a
separate XML document, which lists each file inserted etc.

Once, i get hold of above list of db, I would be eventually deleting that
file and inserting a latest version of that file(which would have same
sub-xml file name). So, constant updating of this external document also
seems painful (Map be ?).

Also, would it be faster, using XQUERY script files, thru java code, or
using Java API for such operations ?

How do you all deal with such operations ?

- Mansi


Re: [basex-talk] XQuery Optimization suggestions

2015-01-20 Thread Mansi Sheth
As part of preparation of presenting at XML Prague, I am working on a slide
showing statistics. From below comments, I started thinking, would it be
best to show time taken against size of the DB or against no of nodes. What
do you all think ? If I am thinking it from no of nodes basis, would it be
a little better comparison with other tools ? For e.g.

1 million records in SQL database ~= 1 million nodes in BaseX, thus making
closer to apples to apples comparison for time taken.

We are currently, battling with this at work too. There are few different
approaches for data mining, for different data sources. I talk in terms of
GBs of data in database and SQL fans, talk in terms of millions of records.
Its hard to make any progress and push for NXDs.

- Mansi

- Mansi

On Sun, Jan 18, 2015 at 11:24 AM, Christian Grün christian.gr...@gmail.com
wrote:

  Just finished processing 310GB of data, with result set worth 11 million
  records within 44 minutes. I am currently psyched with the potential of
 even
  BaseX supporting this kind of data. But I am no expert here.
 
  What are your views on this performance statistics  ?

 My assumption is that it basically boils down to a sequential scan of
 most of the elements in the database (so buying faster SSDs will
 probably be the safest choice to speed up your queries..). 310 GB is a
 lot, so 44 minutes is probably not that bad. Speaking for myself,
 though, I was sometimes surprised that other NoSQL systems I tried
 were not really faster than BaseX, if you have hierarchical data
 structures, and if you need to post-process large amounts of data.

 However, as your queries look pretty simple, you could also have a
 look at e.g. MongoDB or RethinkDB (provided that the data can be
 converted to JSON). Those systems give you convenient Big Data
 features like distribution/sharding or replication.

 But I'm also interested what others say about this.
 Christian

 
  - Mansi
 
  On Sun, Jan 18, 2015 at 10:49 AM, Christian Grün 
 christian.gr...@gmail.com
  wrote:
 
  Hi Mansi,
 
  
  
 http://localhost:8984/rest?run=get_query.xqn=/Archives/*/descendant::c/descendant::a[contains(@name,
 xyz)]/@name/data()
 
  My guess is that most time is spent to parse all the nodes in the
  database. If you know more about the database structure, you could
  replace some of the descendant with explicit child steps. Apart from
  that, I guess I'm repeating myself, but have you tried to remove
  duplicates in XQuery, or do grouping and sorting in the language?
  Usually, it's recommendable to do as much as possible in XQuery itself
  (although it might not be obvious how to do this at first glance).
 
  Christian
 
 
 
 
  --
  - Mansi




-- 
- Mansi


[basex-talk] XQuery Optimization suggestions

2015-01-18 Thread Mansi Sheth
Hello,

I am doing some performance analysis on size of XML files in DB, no of
records in a result set and how much time it takes to get me results.

Currently, I have 150GB worth of XML documents imported into BaseXDB. It
took roughly 21 minutes to return back result set worth 5.3 million records.

Queries are of below form:

http://localhost:8984/rest?run=get_query.xqn=/Archives/*/descendant::c/descendant::a[contains(@name,
xyz)]/@name/data()

XQUERY File:
for $db in db:list()
(: Assign dynamic variables to generate query, to be used in eval :)
let $query := declare variable $db external;  || db:open($db)
|| $n
return xquery:eval($query,map { 'db': $db, 'query': $n })


I have been few questions around this.

1. I have been routinely advices on this email chain, to avoid
serialization from XPATH and let xquery handle it. I tried a few things
like replacing strings() with data(), adding serialization option on REST
call, in XQUERY file etc. But, I don't see any performance gain. Is there
something else I can try or something, I am doing wrong ?

2. Does anyone have any resource to compare this performance to other NoSQL
databases. I am just very curious, how above performance numbers compares
to other DBs ?

- Mansi


Re: [basex-talk] XQuery Optimization suggestions

2015-01-18 Thread Mansi Sheth
Structure of data is nested, so I have to write queries this way
unfortunately. Also, I am doing performance analysis removing all external
parameters like any kind of post-processing, network latency etc. Just
isolating if I can do any better. So, guess this is the best I can do... No
problem at all.

Just finished processing 310GB of data, with result set worth 11 million
records within 44 minutes. I am currently psyched with the potential of
even BaseX supporting this kind of data. But I am no expert here.

What are your views on this performance statistics  ?

- Mansi

On Sun, Jan 18, 2015 at 10:49 AM, Christian Grün christian.gr...@gmail.com
wrote:

 Hi Mansi,

 
 http://localhost:8984/rest?run=get_query.xqn=/Archives/*/descendant::c/descendant::a[contains(@name,
 xyz)]/@name/data()

 My guess is that most time is spent to parse all the nodes in the
 database. If you know more about the database structure, you could
 replace some of the descendant with explicit child steps. Apart from
 that, I guess I'm repeating myself, but have you tried to remove
 duplicates in XQuery, or do grouping and sorting in the language?
 Usually, it's recommendable to do as much as possible in XQuery itself
 (although it might not be obvious how to do this at first glance).

 Christian




-- 
- Mansi


Re: [basex-talk] Silly XQUERY exception

2015-01-08 Thread Mansi Sheth
Lukas,

That was it!! I feel like shooting myself. What an oversight.

Thanks a ton for looking at it and spotting it.

- Mansi

On Thu, Jan 8, 2015 at 2:44 AM, Lukas Kircher lukaskirch...@gmail.com
wrote:

 Hi Mansi,

 let $cmd := /A/*/descendant::C/*descandant*::*[contains(@name,'|| $n
 ||')]”


 Just a quick scan - I marked the problem in bold above - I would try
 ‘desc*e*ndant’
 instead of 'desc*a*ndant’.

 Cheers,
 Lukas




-- 
- Mansi


[basex-talk] Silly XQUERY exception

2015-01-07 Thread Mansi Sheth
Hello,

I feel very stupid and frustrated, not able to fix this error:

Below, is a my query code, which I am trying to run. I am passing the value
for contains clause thru command line and I expect to receive # of xml
files matching $cmd xpath. I always get

Stopped at /veracode/msheth/BaseXWeb/get_prevalence.xq, 16/12:
*[XPST0003] Expecting ':=', found ':'.*

When ran thru REST:

*[XPST0003] Expecting valid step, found 'd'.*

I think, its the way $cmd is being set. I have tried, simple string concat
using ||, using concat command, using html entities etc.

declare variable $n as xs:string external;
declare option output:item-separator #xa;;

(:  let $cmd :=
concat(/A/*/descendant::C/descandant::*[contains(@name,,$singlequote,$n,$singlequote,)])
:)
*let $cmd := /A/*/descendant::C/descandant::*[contains(@name,'|| $n
||')]*

let $aPath :=
for $db in db:list()
let $query :=
  declare variable $db external;  ||
  db:open($db) || $cmd
*return xquery:eval($query,*
*map { 'db': $db, 'query': $cmd })*

let $clients :=
for $elem in $aPath
return db:path($elem)

return $n , distinct-values(count($clients))

Lines of code, which are the culprit, are marked in bold above. Any and all
suggestions are greatly appreciated.

- Mansi


[basex-talk] Design finding almost duplicate xml files

2015-01-02 Thread Mansi Sheth
Hello,

I am trying to come up with a design, which just before insert a xml file
into database, will warn us, that there is almost an identical xml file
(with different name and different size) already stored in the database.

Almost identical would be based on few elements of xml file such as:

root
A name=
 B name=
   C name=/
   C name=/
   .
   .
 /B
/A
.
.
.
/root

A and B from above snippet but different C. Element A could be
repeated 100s of time in single xml file.

Any pointers ?

- Mansi


Re: [basex-talk] Out Of Memory

2014-12-30 Thread Mansi Sheth
Hello,

Wanted to get back to this email chain and share my experience.

I got this running beautifully (including all post processing of results),
using the below command:

curl -ig '
http://localhost:8984/rest?run=get_query.xqn=/Archives/*/descendant::D/@name/string()'
| cut -d: -f1 | cut -d. -f1-3 | sort | uniq -c | sort -n -r

I am using Basex 8.0 beta 763cc93 build. Running this on i7  2.7GHZ MBP,
giving 8GB to basexhttp process. it took around 34 min on a 41 GB data. I
think, lot of time went in post processing (sorting) the result set, rather
than actually extracting the results from BaseX DB.

When tried a similar query on a much smaller database(3GB) on a much
powerful amazon instance, giving 20GB RAM to basex http process, got me
results with post processing within 4 mins.

Thanks for all your inputs guys,

Keep BaseXing... !!!
- Mansi

On Fri, Nov 7, 2014 at 12:25 PM, Mansi Sheth mansi.sh...@gmail.com wrote:

 This email chain, is extremely helpful. Thanks a ton guys. Certainly one
 of the most helpful folks here :)

 I have to try a lot of these suggestions but currently I am being pulled
 into something else, so I have to pause for the time being.

 Will get back to this email thread, after trying a few things and my
 relevant observations.

 - Mansi

 On Fri, Nov 7, 2014 at 3:48 AM, Fabrice Etanchaud fetanch...@questel.com
 wrote:

  Hi Mansi,



 From what I can see,

 for each pqr value, you could use db:attribute-range to retrieve all the
 file names, group by/count to obtain statistics.

 You could also create a new collection from an extraction of only the
 data you need, changing @name into element and use full text fuzzy match.



 Hoping it helps



 Cordialement

 Fabrice



 *De :* basex-talk-boun...@mailman.uni-konstanz.de [mailto:
 basex-talk-boun...@mailman.uni-konstanz.de] *De la part de* Mansi Sheth
 *Envoyé :* jeudi 6 novembre 2014 20:55
 *À :* Christian Grün
 *Cc :* BaseX
 *Objet :* Re: [basex-talk] Out Of Memory



 I would be doing tons of post processing. I never use UI. I either use
 REST thru cURL or command line.



 I would basically need data in below format:



 XML File Name, @name



 I am trying to whitelist picking up values for only
 starts-with(@name,pqr). where pqr is a list of 150 odd values.



 My file names, are essentially some ID/keys, which I would need to map it
 further using sqlite to some values and may be group by it.. etc.



 So, basically I am trying to visualize some data, based on its existence
 in which xml files. So, yes count(query) would be fine, but won't solve
 much purpose, since I still need value pqr.



 - Mansi





 On Thu, Nov 6, 2014 at 11:19 AM, Christian Grün 
 christian.gr...@gmail.com wrote:

  Query: /A/*//E/@name/string()

 In the GUI, all results will be cached, so you could think about
 switching to command line.

 Do you really need to output all results, or do you do some further
 processing with the intermediate results?

 For example, the query count(/A/*//E/@name/string()) will probably
 run without getting stuck.



 
  This query, was going OOM, within few mins.
 
  I tried a few ways, of whitelisting, with contain clause, to truncate
 the
  result set. That didn't help too. So, now I am out of ideas. This is
 giving
  JVM 10GB of dedicated memory.
 
  Once, above query works and doesn't go Out Of Memory, I also need
  corresponding file names too:
 
  XYZ.xml //E/@name
  PQR.xml //E/@name
 
  Let me know if you would need more details, to appreciate the issue ?
  - Mansi
 
  On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün 
 christian.gr...@gmail.com
  wrote:
 
  Hi Mansi,
 
  I think we need more information on the queries that are causing the
  problems.
 
  Best,
  Christian
 
 
 
  On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.com
 wrote:
   Hello,
  
   I have a use case, where I have to extract lots in information from
 each
   XML
   in each DB. Something like, attribute values of most of the nodes in
 an
   XML.
   For such, queries based goes Out Of Memory with below exception. I am
   giving
   it ~12GB of RAM on i7 processor. Well I can't complain here since I
 am
   most
   definitely asking for loads of data, but is there any way I can get
   these
   kinds of data successfully ?
  
   mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp
   BaseX 8.0 beta b45c1e2 [Server]
   Server was started (port: 1984)
   HTTP Server was started (port: 8984)
   Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError:
 Java
   heap
   space
   at
  
  
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
   at
  
  
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
   at
  
  
 org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
   at
  
  
 org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll

[basex-talk] data protection at rest

2014-12-29 Thread Mansi Sheth
Hello,

I am thinking about my options of protecting BaseXDB's data at rest. We are
storing sensitive client data which we need to protect. We would be
migrating BaseXDB on an amazon instance, so protection at rest would be a
primary concern.

What I wish to do is, while inserting data into the database, it should be
encrypted and while using XQUERY, we should have a mechanism to decrypt the
data and retrieve the information needed. I am aware of the performance hit
here. Would evaluate if its acceptable after I could collect some
statistics.

I looked at the docs:
http://docs.basex.org/wiki/Cryptographic_Module#Encryption_.26_Decryption

But, I didn't completely understand a use case for this example. Or if it
would solve my purpose. I am currently using some Java code to insert files
into the database.

Has anyone done something on this line ? Please share some use cases.

- Mansi


Re: [basex-talk] Distributed processing on roadmap ?

2014-11-20 Thread Mansi Sheth
Sorry about the delay. I was busy preparing a presentation for my company
as baseX being a our analytics solution. It was very well received. All
thanks to you and everyone on this user list :)

Based on my use cases, I believe (again I am no expert in this domain),
map/reduce approach would work better. The result set being returned would
contain maximum couple of thousand records with some post-processing on it,
as compared to TBs of data being queried. If the querying and processing
step could use processing power from clusters of nodes, may be we might get
significant performance gain ? What are your thoughts ? What are other use
cases, you come across ?

- Mansi

On Mon, Nov 17, 2014 at 10:50 AM, Christian Grün christian.gr...@gmail.com
wrote:

 Hi Mansi,

 it's nice to hear that you have been successfully scaling your
 database instances so far.

  I love using BaseX and the powers of BaseX. Currently I am able to query
 ~60GB of XML files under 2.5 mins. I still have a few more optimization a
 to try. I also do see this data increasing to a couple of TB shortly.
 
  I would love to see if this kind of processing is almost real time
 (within a min). So my question is there any discussions around supporting
 distributed processing or clusters of nodes etc ?

 Yes, distributed processing is a frequently discussed topic. One of
 our major questions is what challenge to solve first. As you surely
 know, there are so many different NoSQL stores out there, and all of
 them tackle different problems. Up to now, we spent most time on
 replication, but this would not give you better performance.

 So I would be interested to hear what kind of distribution techniques
 you believe would give you better performance. Do you think that a
 map/reduce approach would be helpful, or do you simply have lots of
 data that somehow needs to be sent to a client as quickly as possible?
 In other words, how large are your results sets? Do you really need
 the complete results, or would you rather like to draw some
 conclusions from the scanned data?

 Back to the current technology… Maybe you could do some Java profiling
 (using e.g. -Xrunhprof:cpu=samples) in order to find out what's the
 current bottleneck.

 Best,
 Christian




-- 
- Mansi


Re: [basex-talk] Out Of Memory

2014-11-07 Thread Mansi Sheth
This email chain, is extremely helpful. Thanks a ton guys. Certainly one of
the most helpful folks here :)

I have to try a lot of these suggestions but currently I am being pulled
into something else, so I have to pause for the time being.

Will get back to this email thread, after trying a few things and my
relevant observations.

- Mansi

On Fri, Nov 7, 2014 at 3:48 AM, Fabrice Etanchaud fetanch...@questel.com
wrote:

  Hi Mansi,



 From what I can see,

 for each pqr value, you could use db:attribute-range to retrieve all the
 file names, group by/count to obtain statistics.

 You could also create a new collection from an extraction of only the data
 you need, changing @name into element and use full text fuzzy match.



 Hoping it helps



 Cordialement

 Fabrice



 *De :* basex-talk-boun...@mailman.uni-konstanz.de [mailto:
 basex-talk-boun...@mailman.uni-konstanz.de] *De la part de* Mansi Sheth
 *Envoyé :* jeudi 6 novembre 2014 20:55
 *À :* Christian Grün
 *Cc :* BaseX
 *Objet :* Re: [basex-talk] Out Of Memory



 I would be doing tons of post processing. I never use UI. I either use
 REST thru cURL or command line.



 I would basically need data in below format:



 XML File Name, @name



 I am trying to whitelist picking up values for only
 starts-with(@name,pqr). where pqr is a list of 150 odd values.



 My file names, are essentially some ID/keys, which I would need to map it
 further using sqlite to some values and may be group by it.. etc.



 So, basically I am trying to visualize some data, based on its existence
 in which xml files. So, yes count(query) would be fine, but won't solve
 much purpose, since I still need value pqr.



 - Mansi





 On Thu, Nov 6, 2014 at 11:19 AM, Christian Grün christian.gr...@gmail.com
 wrote:

  Query: /A/*//E/@name/string()

 In the GUI, all results will be cached, so you could think about
 switching to command line.

 Do you really need to output all results, or do you do some further
 processing with the intermediate results?

 For example, the query count(/A/*//E/@name/string()) will probably
 run without getting stuck.



 
  This query, was going OOM, within few mins.
 
  I tried a few ways, of whitelisting, with contain clause, to truncate the
  result set. That didn't help too. So, now I am out of ideas. This is
 giving
  JVM 10GB of dedicated memory.
 
  Once, above query works and doesn't go Out Of Memory, I also need
  corresponding file names too:
 
  XYZ.xml //E/@name
  PQR.xml //E/@name
 
  Let me know if you would need more details, to appreciate the issue ?
  - Mansi
 
  On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün 
 christian.gr...@gmail.com
  wrote:
 
  Hi Mansi,
 
  I think we need more information on the queries that are causing the
  problems.
 
  Best,
  Christian
 
 
 
  On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.com
 wrote:
   Hello,
  
   I have a use case, where I have to extract lots in information from
 each
   XML
   in each DB. Something like, attribute values of most of the nodes in
 an
   XML.
   For such, queries based goes Out Of Memory with below exception. I am
   giving
   it ~12GB of RAM on i7 processor. Well I can't complain here since I am
   most
   definitely asking for loads of data, but is there any way I can get
   these
   kinds of data successfully ?
  
   mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp
   BaseX 8.0 beta b45c1e2 [Server]
   Server was started (port: 1984)
   HTTP Server was started (port: 8984)
   Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError:
 Java
   heap
   space
   at
  
  
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
   at
  
  
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
   at
  
  
 org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
   at
  
  
 org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
   at
  
  
 org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
   at
  
  
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
   at java.lang.Thread.run(Thread.java:744)
  
  
   --
   - Mansi
 
 
 
 
  --
  - Mansi





 --

 - Mansi




-- 
- Mansi


Re: [basex-talk] Dynamic Evaluation of XQUERY

2014-11-07 Thread Mansi Sheth
Christian,

I am running out of ideas in debugging this. When I directly execute this
query within XQUERY file, its working perfectly. Just when I pass it thru
command line, its breaking.

Infact the actual .xq file also doesn't matter, as you pointed out, parsing
from command line is broken. I tried -d switch and escaping spaces, but
didn't help. Also, I tested, this is a valid XPATH query.

Please pardon my XQUERY knowledge, its really not my background.

- Mansi



On Thu, Nov 6, 2014 at 8:45 AM, Christian Grün christian.gr...@gmail.com
wrote:

 Hi Mansi,

  ~/Downloads/basex/bin/basex -bn='/Archives/*//class[contains(@name,abc)
  and contains(@name,pqr)]' get_paths.xq
  Stopped at /Users/mansiadmin/Documents/Research-Projects/BigData, 1/4:
  [XPDY0002] and: no context value bound.

 It seems that and was interpreted as XPath step, so it seems as if
 something went wrong when parsing your query on command line (I doubt
 that it's something specific to BaseX).

 Maybe you can simply try to output the query that causes the error,
 instead of trying to evaluate it?

 Christian


 
  However, below query works as a charm:
 
  ~/Downloads/basex/bin/basex
 -bn='/Archives/*//class[contains(@name,abc)]'
  get_paths.xq
 
  I am hoping, for first query above, its some syntactic issue at my end.
 But,
  couldn't fix it, so thought should point out. Please advise.
 
  Code:
 
  declare variable $n as xs:string external;
  declare option output:item-separator #xa;;
 
  let $aPath :=
  for $db in db:list()
  let $query :=
declare variable $db external;  ||
db:open($db) || $n
  return xquery:eval($query,
map { 'db': $db, 'query': $n })
 
  let $paths :=
  for $elem in $aPath
  return db:path($elem)
 
  return distinct-values($paths)
 
  On Mon, Nov 3, 2014 at 6:48 PM, Christian Grün 
 christian.gr...@gmail.com
  wrote:
 
  …in the meanwhile, could you please check if the bug has possibly been
  fixed in the latest 8.0 snapshot [1]?
 
  [1] http://files.basex.org/releases/latest
 
 
  On Tue, Nov 4, 2014 at 12:46 AM, Christian Grün
  christian.gr...@gmail.com wrote:
   Improper use? Potential bug? Your feedback is welcome:
  
   Sounds like a little bug indeed; I will check it tomorrow!
  
  
   Contact: basex-talk@mailman.uni-konstanz.de
   Version: BaseX 7.9
   Java: Oracle Corporation, 1.7.0_45
   OS: Mac OS X, x86_64
   Stack Trace:
   java.lang.NullPointerException
   at org.basex.query.value.item.Str.get(Str.java:49)
   at org.basex.query.func.FNDb.path(FNDb.java:489)
   at org.basex.query.func.FNDb.item(FNDb.java:128)
   at org.basex.query.expr.ParseExpr.iter(ParseExpr.java:45)
   at org.basex.query.func.FNDb.iter(FNDb.java:92)
   at org.basex.query.gflwor.GFLWOR$2.next(GFLWOR.java:78)
   at org.basex.query.MainModule$1.next(MainModule.java:98)
   at org.basex.core.cmd.AQuery.query(AQuery.java:91)
   at org.basex.core.cmd.XQuery.run(XQuery.java:22)
   at org.basex.core.Command.run(Command.java:329)
   at org.basex.core.Command.execute(Command.java:94)
   at org.basex.server.LocalSession.execute(LocalSession.java:121)
   at org.basex.server.Session.execute(Session.java:37)
   at org.basex.core.CLI.execute(CLI.java:106)
   at org.basex.BaseX.init(BaseX.java:123)
   at org.basex.BaseX.main(BaseX.java:42)
  
  
   On Thu, Oct 30, 2014 at 5:54 AM, Christian Grün
   christian.gr...@gmail.com
   wrote:
  
   Hi Mansi,
  
   you have been close! It could work with the following query (I
 haven't
   tried it out, though):
  
   _ get_query_result.xq 
  
   declare variable $n external;
   declare option output:item-separator #xa;;
  
   let $aList :=
 for $name in db:list()
 let $db := db:open($name)
 return xquery:eval($n, map { '': $db })
  
   return distinct-values($aList)
   __
  
   In this code, I'm opening the database in the main loop, and I then
   bind it to the empty string. This way, the database will be the
   context of the query to be evaluated query, and you won't have to
 deal
   with bugs that arise from the concatenation of db:open and the
 query
   string.
  
1. Can we assign dynamic values as a value to a map's key ?
2. Can I map have more than one key, in query:eval ?
  
   This is both possible. As you see in the following query, you'll
 again
   have to declare the variables that you want to bind. I agree this
   causes a lot of code, so we may simplify it again in a future
 version
   of BaseX:
   __
  
   let $n := /a/b/c
   for $db in db:list()
   let $query :=
 declare variable $db external;  ||
 db:open($db) || $n
   return xquery:eval($query,
 map { 'db': $db, 'query': $n })
   __
  
   Best,
   Christian
  
  
  
  
   --
   - Mansi
 
 
 
 
  --
  - Mansi




-- 
- Mansi


Re: [basex-talk] Out Of Memory

2014-11-06 Thread Mansi Sheth
This would need a lot of details, so bear with me below:

Briefly my XML files look like:

A name=
B name=
   C name=
D name=
 E name=/

A can contain B, C or D and B, C or D can contain E. We have 1000s
(currently 3000 in my test data set) of such xml files, of size 50MB on an
average. Its tons of data ! Currently, my database is of ~18GB in size.

Query: /A/*//E/@name/string()

This query, was going OOM, within few mins.

I tried a few ways, of whitelisting, with contain clause, to truncate the
result set. That didn't help too. So, now I am out of ideas. This is giving
JVM 10GB of dedicated memory.

Once, above query works and doesn't go Out Of Memory, I also need
corresponding file names too:

XYZ.xml //E/@name
PQR.xml //E/@name

Let me know if you would need more details, to appreciate the issue ?
- Mansi

On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün christian.gr...@gmail.com
wrote:

 Hi Mansi,

 I think we need more information on the queries that are causing the
 problems.

 Best,
 Christian



 On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.com wrote:
  Hello,
 
  I have a use case, where I have to extract lots in information from each
 XML
  in each DB. Something like, attribute values of most of the nodes in an
 XML.
  For such, queries based goes Out Of Memory with below exception. I am
 giving
  it ~12GB of RAM on i7 processor. Well I can't complain here since I am
 most
  definitely asking for loads of data, but is there any way I can get these
  kinds of data successfully ?
 
  mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp
  BaseX 8.0 beta b45c1e2 [Server]
  Server was started (port: 1984)
  HTTP Server was started (port: 8984)
  Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError: Java
 heap
  space
  at
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
  at
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
  at
 
 org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
  at java.lang.Thread.run(Thread.java:744)
 
 
  --
  - Mansi




-- 
- Mansi


Re: [basex-talk] Out Of Memory

2014-11-06 Thread Mansi Sheth
Interesting idea, I thought of using db partition, but didn't pursue it
further, mainly due to below thought process.

Currently, I am ingesting ~3000 xml files, storing ~50 xml files per db,
which would be growing quickly. So, below approach would lead to ~3000 more
files (which would be increasing), increasing I/O operations considerably
for further pre-processing.

However, I don't really care if process takes few minutes to few hours (as
long as its not day(s) ;)). Given the situation and my options, I would
surely try this.

Database, is currently indexed at attribute level, as thats what I would be
querying the most. Do you think, I should do anything differently ?

Thanks,
- Mansi

On Thu, Nov 6, 2014 at 10:48 AM, Fabrice Etanchaud fetanch...@questel.com
wrote:

  Hi Mansi,



 Here you have a natural partition of your data : the files you ingested.

 So my first suggestion would be to query your data on a file basis:



 for $doc in db:open(‘your_collection_name’)

 let $file-name := db:path($doc)

 return

 file:write(

 $file-name,

 names

{

for $name in
 $doc//E/@name/data()

return


 name{$name}/name

 }

 /names

 )



 Is it for indexing ?



 Hope it helps,



 Best regards,



 Fabrice Etanchaud

 Questel/Orbit



 *De :* basex-talk-boun...@mailman.uni-konstanz.de [mailto:
 basex-talk-boun...@mailman.uni-konstanz.de] *De la part de* Mansi Sheth
 *Envoyé :* jeudi 6 novembre 2014 16:33
 *À :* Christian Grün
 *Cc :* BaseX
 *Objet :* Re: [basex-talk] Out Of Memory



 This would need a lot of details, so bear with me below:



 Briefly my XML files look like:



 A name=

 B name=

C name=

 D name=

  E name=/



 A can contain B, C or D and B, C or D can contain E. We have 1000s
 (currently 3000 in my test data set) of such xml files, of size 50MB on an
 average. Its tons of data ! Currently, my database is of ~18GB in size.



 Query: /A/*//E/@name/string()



 This query, was going OOM, within few mins.



 I tried a few ways, of whitelisting, with contain clause, to truncate the
 result set. That didn't help too. So, now I am out of ideas. This is giving
 JVM 10GB of dedicated memory.



 Once, above query works and doesn't go Out Of Memory, I also need
 corresponding file names too:



 XYZ.xml //E/@name

 PQR.xml //E/@name



 Let me know if you would need more details, to appreciate the issue ?

 - Mansi



 On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün christian.gr...@gmail.com
 wrote:

 Hi Mansi,

 I think we need more information on the queries that are causing the
 problems.

 Best,
 Christian




 On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.com wrote:
  Hello,
 
  I have a use case, where I have to extract lots in information from each
 XML
  in each DB. Something like, attribute values of most of the nodes in an
 XML.
  For such, queries based goes Out Of Memory with below exception. I am
 giving
  it ~12GB of RAM on i7 processor. Well I can't complain here since I am
 most
  definitely asking for loads of data, but is there any way I can get these
  kinds of data successfully ?
 
  mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp
  BaseX 8.0 beta b45c1e2 [Server]
  Server was started (port: 1984)
  HTTP Server was started (port: 8984)
  Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError: Java
 heap
  space
  at
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
  at
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
  at
 
 org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
  at java.lang.Thread.run(Thread.java:744)
 
 
  --
  - Mansi





 --

 - Mansi




-- 
- Mansi


Re: [basex-talk] Out Of Memory

2014-11-06 Thread Mansi Sheth
I would be doing tons of post processing. I never use UI. I either use REST
thru cURL or command line.

I would basically need data in below format:

XML File Name, @name

I am trying to whitelist picking up values for only
starts-with(@name,pqr). where pqr is a list of 150 odd values.

My file names, are essentially some ID/keys, which I would need to map it
further using sqlite to some values and may be group by it.. etc.

So, basically I am trying to visualize some data, based on its existence in
which xml files. So, yes count(query) would be fine, but won't solve much
purpose, since I still need value pqr.

- Mansi


On Thu, Nov 6, 2014 at 11:19 AM, Christian Grün christian.gr...@gmail.com
wrote:

  Query: /A/*//E/@name/string()

 In the GUI, all results will be cached, so you could think about
 switching to command line.

 Do you really need to output all results, or do you do some further
 processing with the intermediate results?

 For example, the query count(/A/*//E/@name/string()) will probably
 run without getting stuck.


 
  This query, was going OOM, within few mins.
 
  I tried a few ways, of whitelisting, with contain clause, to truncate the
  result set. That didn't help too. So, now I am out of ideas. This is
 giving
  JVM 10GB of dedicated memory.
 
  Once, above query works and doesn't go Out Of Memory, I also need
  corresponding file names too:
 
  XYZ.xml //E/@name
  PQR.xml //E/@name
 
  Let me know if you would need more details, to appreciate the issue ?
  - Mansi
 
  On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün 
 christian.gr...@gmail.com
  wrote:
 
  Hi Mansi,
 
  I think we need more information on the queries that are causing the
  problems.
 
  Best,
  Christian
 
 
 
  On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.com
 wrote:
   Hello,
  
   I have a use case, where I have to extract lots in information from
 each
   XML
   in each DB. Something like, attribute values of most of the nodes in
 an
   XML.
   For such, queries based goes Out Of Memory with below exception. I am
   giving
   it ~12GB of RAM on i7 processor. Well I can't complain here since I am
   most
   definitely asking for loads of data, but is there any way I can get
   these
   kinds of data successfully ?
  
   mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp
   BaseX 8.0 beta b45c1e2 [Server]
   Server was started (port: 1984)
   HTTP Server was started (port: 8984)
   Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError:
 Java
   heap
   space
   at
  
  
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
   at
  
  
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
   at
  
  
 org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
   at
  
  
 org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
   at
  
  
 org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
   at
  
  
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
   at java.lang.Thread.run(Thread.java:744)
  
  
   --
   - Mansi
 
 
 
 
  --
  - Mansi




-- 
- Mansi


Re: [basex-talk] Dynamic Evaluation of XQUERY

2014-11-03 Thread Mansi Sheth
Thanks Christian,

The second query below worked beautifully.

I am trying to get db:path of the dynamic query. Code:

declare variable $n as xs:string external;
declare option output:item-separator #xa;;

let $aPath :=
for $db in db:list()
let $query :=
  declare variable $db external;  ||
  db:open($db) || $n
return xquery:eval($query,
  map { 'db': $db, 'query': $n })


for $elem in $aPath
return db:path($elem)

and am getting below exception, when called:

mansi@work:BigData mansiadmin$ basex -b\$n='query' get_paths.xq
Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 7.9
Java: Oracle Corporation, 1.7.0_45
OS: Mac OS X, x86_64
Stack Trace:
java.lang.NullPointerException
at org.basex.query.value.item.Str.get(Str.java:49)
at org.basex.query.func.FNDb.path(FNDb.java:489)
at org.basex.query.func.FNDb.item(FNDb.java:128)
at org.basex.query.expr.ParseExpr.iter(ParseExpr.java:45)
at org.basex.query.func.FNDb.iter(FNDb.java:92)
at org.basex.query.gflwor.GFLWOR$2.next(GFLWOR.java:78)
at org.basex.query.MainModule$1.next(MainModule.java:98)
at org.basex.core.cmd.AQuery.query(AQuery.java:91)
at org.basex.core.cmd.XQuery.run(XQuery.java:22)
at org.basex.core.Command.run(Command.java:329)
at org.basex.core.Command.execute(Command.java:94)
at org.basex.server.LocalSession.execute(LocalSession.java:121)
at org.basex.server.Session.execute(Session.java:37)
at org.basex.core.CLI.execute(CLI.java:106)
at org.basex.BaseX.init(BaseX.java:123)
at org.basex.BaseX.main(BaseX.java:42)


On Thu, Oct 30, 2014 at 5:54 AM, Christian Grün christian.gr...@gmail.com
wrote:

 Hi Mansi,

 you have been close! It could work with the following query (I haven't
 tried it out, though):

 _ get_query_result.xq 

 declare variable $n external;
 declare option output:item-separator #xa;;

 let $aList :=
   for $name in db:list()
   let $db := db:open($name)
   return xquery:eval($n, map { '': $db })

 return distinct-values($aList)
 __

 In this code, I'm opening the database in the main loop, and I then
 bind it to the empty string. This way, the database will be the
 context of the query to be evaluated query, and you won't have to deal
 with bugs that arise from the concatenation of db:open and the query
 string.

  1. Can we assign dynamic values as a value to a map's key ?
  2. Can I map have more than one key, in query:eval ?

 This is both possible. As you see in the following query, you'll again
 have to declare the variables that you want to bind. I agree this
 causes a lot of code, so we may simplify it again in a future version
 of BaseX:
 __

 let $n := /a/b/c
 for $db in db:list()
 let $query :=
   declare variable $db external;  ||
   db:open($db) || $n
 return xquery:eval($query,
   map { 'db': $db, 'query': $n })
 __

 Best,
 Christian




-- 
- Mansi


[basex-talk] Dynamic Evaluation of XQUERY

2014-10-29 Thread Mansi Sheth
Hello,

I want to devise a generic xquery script, which accepts actual XPATH to be
run across all documents in all databases, from command line. Something
like:

curl -i 
http://localhost:8984/rest?run=get_query_result.xqn=/root/*//calls/@name/string()


Basically, parameter n would hold the query.

I am trying this using xquery module eval method. But, am not succeeding in
it. I have my script something as under:

declare variable $n as xs:string external;
declare option output:item-separator #xa;;

let $aList :=
for $db in db:list()
let $vars := map {'db_name':$db , 'query' : $n}
let $query_to_execute := db:open($db_name)$query
return xquery:eval($query_to_execute,$vars)

return distinct-values($aList)

My questions are:

1. Can we assign dynamic values as a value to a map's key ?
2. Can I map have more than one key, in query:eval ?

Please point me in right direction, or explain what am I doing wrong in
above code ..

- Mansi


Re: [basex-talk] Architecture Question

2014-10-22 Thread Mansi Sheth
Christian,

Thanks for all your responses. It truly helps a lot.

re: Importing data into databases: I realized, for the extent of this POC,
I will just count no of docs in each database (currently programmed to be
50) and keep creating new databases. Structure of data is same, but its
nested in nature. Like a folder can have folder, which can have file etc.
Usually, it won't be more than 4 levels deep. Thats a good tip, to guess no
of nodes based on byte size. I guess, for time being I will move on, with
just storing 50 docs per DB.

re: terabytes of data. Well, I am planning on using ~6 months worth of data
for any analysis and discarding data prior to that (leaving it around in
backups). Obviously, would be going some cloud route for such resources,
will see how much budget I can manage to get :) Am very positive about
this. So, no its not only a theoretical assumption as far as I can see.

re: Currently, I am looking into querying these databases. I am exploring
REST for it. From documentation, it seems our only option is supporting
these queries (on server side) using XQUERY or RestXQ, no Java/Python ? I
am well versed with XPATH and XSLT, gearing up towards XQUERY now. But,
would be a little easier (just my personal preference :)) to manipulate
data in Java/Python before serving it back to client. Is there any such
facility ? Something like:

http://localhost:8984/rest?run=getData.java;

similarly for python ?

- Mansi

Some preliminary statistics: Imported 2050 XML documents in 22 min
(including indexing on attributes).

On Sun, Oct 19, 2014 at 6:14 PM, Christian Grün christian.gr...@gmail.com
wrote:

 Hi Mansi,

   Is there some book/resource you can point me to, which helps better
 visualize NXD ?

 sorry for letting you wait. If you want to know more about native XML
 databases, I recommend you to have a closer look at various articles
 in our Wiki (e. g. [1,2]). It will also be helpful if you get into the
 basics of XQuery [3].

 Have you tried to realize some of the hints I gave in my previous mails?

  I am trying to distribute data across multiple databases. I can't
 distribute
  based on day, as there could very well be situation, where single day's
 data
  could more than capacity of BaseX DB.

 If 2 billion XML nodes per day are not enough, you will probably need
 to create more than one database per day. Via the info db command,
 you see how many nodes are currently stored in a database, but there
 is no cheap solution to find out the number of nodes of an incoming
 document, because XML documents can be very heterogeneous. Some
 questions back:

 * Do you have some more information on the data you want to store?
 * Are all documents similar or do they vary greatly? If the documents
 are somewhat similar, you can usually estimate the number of nodes by
 looking at the byte size.
 * Do you know that you will really need to store lots of terabytes of
 XML data, or it is more like a theoretical assumption?

 Christian

 [1] http://docs.basex.org/wiki/Database
 [2] http://docs.basex.org/wiki/Table_of_Contents
 [3] http://docs.basex.org/wiki/Xquery




-- 
- Mansi


Re: [basex-talk] web application on a local installation ?

2014-10-16 Thread Mansi Sheth
EM,

Are you still facing that. I too installed using homebrew. started service
using: basexhttp,  and made an http request:

http://localhost:8984/rest/db_name?query=query

and everything worked fine.

Is your database on local machine from where you started basexhttp ? I
initially, had my database on external HDD, and service wasn't starting.

Hope this helps,
- Mansi

On Thu, Oct 9, 2014 at 3:51 AM, Emmanuelle Morlock 
emmanuelle.morl...@mom.fr wrote:

  Hi
 sorry to ask a basic questions, but I'm a newbie that doesn't always
 understand all the pre-requisite in the documentation. If I can't find help
 here, just tell me.

 I tried to install a local instance of baseX and use the web application
 features.
 I work on mac and installed baseX with homebrew
 after changing the password of the admin user,
 and creating a database,
 I launched the httpserver and typed in my browser http://localhost:8984/
 the result is :
 HTTP ERROR: 503

 Problem accessing /webapp. Reason:

 Service Unavailable

 what am I missing ?
 is it even possible to use a web app on a local computer ?
 thanks in advance for your help...

 EM




-- 
- Mansi


Re: [basex-talk] Architecture Question

2014-10-16 Thread Mansi Sheth
I am trying to distribute data across multiple databases. I can't
distribute based on day, as there could very well be situation, where
single day's data could more than capacity of BaseX DB. From statistics
page, only other way, which I can distribute is based on number of nodes.
But going with that, I am not able to find a way, I can get hold of a way
to access no of nodes programmatically in a db. Further, I am clueless,
if I can even find no of nodes of current doc to be imported.

So,

currentDocToImport = a.xml
??NodeNo(a.xml)

NumberOfNodes(LastDB) = ??

Do you guys agree if this is even a way to go ? Can someone give me
pointers on how to find above 2 values ? Any other thoughts are always
welcomed ...

- Mansi

On Tue, Oct 7, 2014 at 5:35 AM, Christian Grün christian.gr...@gmail.com
wrote:

 Dear Mansi,

  1. I have 1000s of XML files (each between 50MB-400MB) and this is going
 to
  grow exponentially (~200 / per day). So, my question is how scalable is
  BaseX ? Can I configure it to use data from my external HDD, in my
 initial
  prototype ?

 So this means you want to add appr. 40 gb of XML files per day, right,
 amounting to 14 tb/year? This sounds quite a lot indeed. You can have
 a look at our statistics page [1]; it gives you some insight into the
 current limits of BaseX.

 However, all limits are per single database. You can distribute your
 data in multiple databases and address multiple databases with a
 single XPath/XQuery request. For example, you could create a new
 database every day and run a query over all these databases:

   for $db in db:list()
   return db:open($db)/path/to/your/data

  2. I plan to heavily use XPATH, for data retrieval. Does BaseX, use any
  multi-processing, multi-threading to speed up search ? Any concurrent
  processing ?

 Read-only requests will automatically be multithreaded. If a single
 query leads to heavy I/O requests, it may be that single threaded
 processing wlil give you better results (because hard drives are often
 not very good in reading data in parallel).

  3. Can I do some post-processing on searched and retrieved data ? Like
  sorting, unique elements etc ?

 With XQuery (3.0), you can do virtually anything with your data. In
 most of our data-driven scenarios, all data processing is completely
 done in BaseX. Some plain examples can be found in our Wiki [2].

 Hope this helps,
 Christian

 [1] http://docs.basex.org/wiki/Statistics
 [2] http://docs.basex.org/wiki/XQuery_3.0




-- 
- Mansi


Re: [basex-talk] Architecture Question

2014-10-10 Thread Mansi Sheth
Christian,

So, going ahead with my POC and use cases we plan to solve, I have a few
more database architecture questions..

1. Is there a way, we can have a table, with multiple columns. One of the
column would be ID and others would be different XML information for that
ID.

2. Can I map above table, with a relational table to perform join queries
on ID.

Thanks,
- Mansi

On Wed, Oct 8, 2014 at 12:53 PM, Christian Grün christian.gr...@gmail.com
wrote:

  I just created a single Database with ~190 XML files of size 8.5 GB
 total.
  Activated indexes as well. Creating database using basexgui took close
 to an
  hour. Running a simple XQUERY took ~3 min. Database was created on an
  external USB 3.0 HDD. I will obviously be creating new databases across
  drives (if this POC is successful, will surely go for cloud) to scale it.
 
  For time being, any and all tips are welcomes to optimize performance.

 Indeed performance should be much better if databases are created and
 queried on HDs or SSDs. Feel free to send us your queries if execution
 time is not good enough.

  May be I will soon contribute to the statistics pages :)

 Thanks,
 Christian




-- 
- Mansi


Re: [basex-talk] Architecture Question

2014-10-10 Thread Mansi Sheth
On Fri, Oct 10, 2014 at 10:31 AM, Christian Grün christian.gr...@gmail.com
wrote:

 Hi Mansi,

 Out of interest: why don't you simply store all documents in the
 database and use the document path as ID?


I am storing deeply nested hierarchal data in XML files. Simply put, most
of my queries are going to be relative (For e..g //@name). So, I am
assuming it would be a huge performance hit. Specially, when I know each
ID will most definitely have multiple XML documents. Correct me, if I am
wrong here.


 as BaseX is a native XML store, there is no way to store data in
 structures like tables. However, due to the flexibility of XML
 structures, the usual way is to create another document or database
 that contains ID and additional meta data.


I don't know, if I follow you completely here. Is there some metadata
information which I can use, which maps each XML file stored in NXD to
another relational database you discussed above, which I can use for
mapping ?



 Best,
 Christian




-- 
- Mansi


Re: [basex-talk] Architecture Question

2014-10-08 Thread Mansi Sheth
Thanks Christian.

re: size of data, I am hoping some days would be quieter than discussed
below. But, yes its going to be a lot of data.

I just created a single Database with ~190 XML files of size 8.5 GB total.
Activated indexes as well. Creating database using basexgui took close to
an hour. Running a simple XQUERY took ~3 min. Database was created on an
external USB 3.0 HDD. I will obviously be creating new databases across
drives (if this POC is successful, will surely go for cloud) to scale it.

For time being, any and all tips are welcomes to optimize performance.

May be I will soon contribute to the statistics pages :)

- Mansi

On Tue, Oct 7, 2014 at 5:35 AM, Christian Grün christian.gr...@gmail.com
wrote:

 Dear Mansi,

  1. I have 1000s of XML files (each between 50MB-400MB) and this is going
 to
  grow exponentially (~200 / per day). So, my question is how scalable is
  BaseX ? Can I configure it to use data from my external HDD, in my
 initial
  prototype ?

 So this means you want to add appr. 40 gb of XML files per day, right,
 amounting to 14 tb/year? This sounds quite a lot indeed. You can have
 a look at our statistics page [1]; it gives you some insight into the
 current limits of BaseX.

 However, all limits are per single database. You can distribute your
 data in multiple databases and address multiple databases with a
 single XPath/XQuery request. For example, you could create a new
 database every day and run a query over all these databases:

   for $db in db:list()
   return db:open($db)/path/to/your/data

  2. I plan to heavily use XPATH, for data retrieval. Does BaseX, use any
  multi-processing, multi-threading to speed up search ? Any concurrent
  processing ?

 Read-only requests will automatically be multithreaded. If a single
 query leads to heavy I/O requests, it may be that single threaded
 processing wlil give you better results (because hard drives are often
 not very good in reading data in parallel).

  3. Can I do some post-processing on searched and retrieved data ? Like
  sorting, unique elements etc ?

 With XQuery (3.0), you can do virtually anything with your data. In
 most of our data-driven scenarios, all data processing is completely
 done in BaseX. Some plain examples can be found in our Wiki [2].

 Hope this helps,
 Christian

 [1] http://docs.basex.org/wiki/Statistics
 [2] http://docs.basex.org/wiki/XQuery_3.0




-- 
- Mansi


[basex-talk] Architecture Question

2014-10-06 Thread Mansi Sheth
Hello,

I have been going thru and comparing different Native XML Databases and so
far I am liking BaseX. However, there are still a few questions unanswered,
before I make a final choice:

1. I have 1000s of XML files (each between 50MB-400MB) and this is going to
grow exponentially (~200 / per day). So, my question is how scalable is
BaseX ? Can I configure it to use data from my external HDD, in my initial
prototype ?

2. I plan to heavily use XPATH, for data retrieval. Does BaseX, use any
multi-processing, multi-threading to speed up search ? Any concurrent
processing ?

3. Can I do some post-processing on searched and retrieved data ? Like
sorting, unique elements etc ?

- Mansi