from:"Mansi Sheth"

Re: [basex-talk] Runtime Exception

2016-05-03 Thread Mansi Sheth

Upgraded to latest basex version. Still getting such runtime errors:

HTTP/1.1 400 Bad Request
Content-Type: text/plain;charset=UTF-8^M
Content-Length: 3222
Server: Jetty(8.1.18.v20150929)

Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 8.4.4
Java: Oracle Corporation, 1.7.0_95
OS: Linux, amd64
Stack Trace:
java.lang.NullPointerException
at org.basex.data.DiskData.write(DiskData.java:120)
at org.basex.data.DiskData.close(DiskData.java:140)
at org.basex.core.Datas.unpin(Datas.java:53)
at org.basex.core.cmd.Close.close(Close.java:45)
at org.basex.query.QueryResources.close(QueryResources.java:108)
at org.basex.query.QueryContext.close(QueryContext.java:603)
at org.basex.query.QueryProcessor.close(QueryProcessor.java:262)
at org.basex.core.cmd.AQuery.query(AQuery.java:99)
at org.basex.core.cmd.XQuery.run(XQuery.java:22)
at org.basex.core.Command.run(Command.java:398)
at org.basex.http.rest.RESTCmd.run(RESTCmd.java:99)
at org.basex.http.rest.RESTQuery.query(RESTQuery.java:74)
at org.basex.http.rest.RESTRun.run0(RESTRun.java:41)
at org.basex.http.rest.RESTCmd.run(RESTCmd.java:65)
at org.basex.core.Command.run(Command.java:398)
at org.basex.core.Command.execute(Command.java:100)
at org.basex.core.Command.execute(Command.java:123)
at org.basex.http.rest.RESTServlet.run(RESTServlet.java:22)
at org.basex.http.BaseXServlet.service(BaseXServlet.java:64)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:231)
at
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)

On Fri, Apr 29, 2016 at 1:49 PM, Christian Grün <christian.gr...@gmail.com>
wrote:

> Hi Mansi,
>
> This error shouldn’t show up anymore with more recent versions of
> BaseX. Could you try the latest version?
>
> Regarding the new error, could you please tell us more about what you
> were doing with your data? Did you read and write at the same time?
> Did you use different BaseX instances to access the data in parallel?
>
> Thanks
> Christian
>
>
> On Fri, Apr 29, 2016 at 6:28 PM, Mansi Sheth <mansi.sh...@gmail.com>
> wrote:
> > Hello,
> >
> > So, now I am stuck. I am not even able to access any database:
> >
> > ubuntu@ip-10-0-0-83:~$ basex
> > BaseX 8.2.3 [Standalone]
> > Try help to get more information.
> >
> >> list
> > Improper use? Potential bug? Your feedback is welcome:
> > Contact: basex-talk@mailman.uni-konstanz.de
> > Version: BaseX 8.2.3
> > Java: Oracle Corporation, 1.7.0_95
> > OS: Linux, amd64
> > Stack Trace:
> > java.lang.ArrayIndexOutOfBoundsException: 0
> > at org.basex.util.Version.(Version.java:33)
> > at org.basex.util.Version.(Version.java:24)
> > at org.basex.dat

Re: [basex-talk] Runtime Exception

2016-04-29 Thread Mansi Sheth

Hello,

So, now I am stuck. I am not even able to access any database:

ubuntu@ip-10-0-0-83:~$ basex
BaseX 8.2.3 [Standalone]
Try help to get more information.

> list
Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 8.2.3
Java: Oracle Corporation, 1.7.0_95
OS: Linux, amd64
Stack Trace:
java.lang.ArrayIndexOutOfBoundsException: 0
at org.basex.util.Version.(Version.java:33)
at org.basex.util.Version.(Version.java:24)
at org.basex.data.MetaData.read(MetaData.java:315)
at org.basex.data.MetaData.read(MetaData.java:262)
at org.basex.core.cmd.List.list(List.java:83)
at org.basex.core.cmd.List.run(List.java:52)
at org.basex.core.Command.run(Command.java:398)
at org.basex.core.Command.execute(Command.java:100)
at org.basex.api.client.LocalSession.execute(LocalSession.java:132)
at org.basex.api.client.Session.execute(Session.java:36)
at org.basex.core.CLI.execute(CLI.java:103)
at org.basex.core.CLI.execute(CLI.java:87)
at org.basex.BaseX.console(BaseX.java:191)
at org.basex.BaseX.(BaseX.java:166)
at org.basex.BaseX.main(BaseX.java:42)

>


On Tue, Apr 26, 2016 at 12:14 PM, Christian Grün <christian.gr...@gmail.com>
wrote:

> Hi Mansi,
>
> Thanks for the feedback. Errors like this sometime occur if databases
> are requested from different JVMs at the same time. See e.g. [1] for
> more information.
>
> Cheers
> Christian
>
> [1] http://docs.basex.org/wiki/Startup#Concurrent_Operations
>
>
> On Tue, Apr 26, 2016 at 6:02 PM, Mansi Sheth <mansi.sh...@gmail.com>
> wrote:
> > I did try the inspect command on all databases, which should there are no
> > inconsistencies.I was logging any exceptions, in my java code, in case of
> > errors, and that showed me, which database in particular was in problem,
> > dropping it helped.
> >
> > This was the first time I saw it. I was worried, DB has grown till the
> point
> > of not being supported, when I actually panicked.
> >
> > Thanks,
> > - Mansi
> >
> > On Tue, Apr 26, 2016 at 3:27 AM, Christian Grün <
> christian.gr...@gmail.com>
> > wrote:
> >>
> >> Dear Mansi,
> >>
> >> you could try to run the INSPECT command on the affected database, or
> all
> >> databases, in order to find out if your database has gone corrupt. Did
> you
> >> repeatedly come across this error?
> >>
> >> Best,
> >> Christian
> >>
> >> Am 25.04.2016 16:45 schrieb "Mansi Sheth" <mansi.sh...@gmail.com>:
> >> >
> >> > Hello,
> >> >
> >> > My current BaseXDB is at 920GB, with ~230 databases... I run jetty
> >> > server visa basexhttp script with giving it explicit 30GB of RAM.
> While
> >> > trying to access a query, thru REST api via XQUERY, I get below error.
> >> >
> >> > HTTP/1.1 400 Bad Request^M
> >> > Content-Type: text/plain;charset=UTF-8^M
> >> > Content-Length: 4207^M
> >> > Server: Jetty(8.1.16.v20140903)^M
> >> > ^M
> >> > Improper use? Potential bug? Your feedback is welcome:
> >> > Contact: basex-talk@mailman.uni-konstanz.de
> >> > Version: BaseX 8.2.3
> >> > Java: Oracle Corporation, 1.7.0_95
> >> > OS: Linux, amd64
> >> > Stack Trace:
> >> > java.lang.RuntimeException: Data Access out of bounds:
> >> > - pre value: 126882320
> >> > - #used blocks: 495643
> >> > - #total locks: 495643
> >> > - access: 495642 (495643 > 495642]
> >> > at org.basex.util.Util.notExpected(Util.java:60)
> >> > at
> >> > org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:458)
> >> > at
> >> > org.basex.io.random.TableDiskAccess.read1(TableDiskAccess.java:148)
> >> > at org.basex.data.Data.kind(Data.java:306)
> >> > at org.basex.query.value.node.DBNode.(DBNode.java:51)
> >> > at
> org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:68)
> >> > at
> org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:22)
> >> > at org.basex.query.value.seq.Seq$1.next(Seq.java:77)
> >> > at org.basex.query.expr.path.IterPath$1.next(IterPath.java:58)
> >> > at org.basex.query.expr.path.IterPath$1.next(IterPath.java:36)
> >> > at org.basex.query.MainModule$1.next(MainModule.java:114)
> >> > at
> >> > org.basex.query.func.StandardFunc.cache(StandardFunc.java:384)
> >> > at
>

Re: [basex-talk] Runtime Exception

2016-04-26 Thread Mansi Sheth

I did try the inspect command on all databases, which should there are no
inconsistencies.I was logging any exceptions, in my java code, in case of
errors, and that showed me, which database in particular was in problem,
dropping it helped.

This was the first time I saw it. I was worried, DB has grown till the
point of not being supported, when I actually panicked.

Thanks,
- Mansi

On Tue, Apr 26, 2016 at 3:27 AM, Christian Grün <christian.gr...@gmail.com>
wrote:

> Dear Mansi,
>
> you could try to run the INSPECT command on the affected database, or all
> databases, in order to find out if your database has gone corrupt. Did you
> repeatedly come across this error?
>
> Best,
> Christian
>
> Am 25.04.2016 16:45 schrieb "Mansi Sheth" <mansi.sh...@gmail.com>:
> >
> > Hello,
> >
> > My current BaseXDB is at 920GB, with ~230 databases... I run jetty
> server visa basexhttp script with giving it explicit 30GB of RAM. While
> trying to access a query, thru REST api via XQUERY, I get below error.
> >
> > HTTP/1.1 400 Bad Request^M
> > Content-Type: text/plain;charset=UTF-8^M
> > Content-Length: 4207^M
> > Server: Jetty(8.1.16.v20140903)^M
> > ^M
> > Improper use? Potential bug? Your feedback is welcome:
> > Contact: basex-talk@mailman.uni-konstanz.de
> > Version: BaseX 8.2.3
> > Java: Oracle Corporation, 1.7.0_95
> > OS: Linux, amd64
> > Stack Trace:
> > java.lang.RuntimeException: Data Access out of bounds:
> > - pre value: 126882320
> > - #used blocks: 495643
> > - #total locks: 495643
> > - access: 495642 (495643 > 495642]
> > at org.basex.util.Util.notExpected(Util.java:60)
> > at
> org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:458)
> > at
> org.basex.io.random.TableDiskAccess.read1(TableDiskAccess.java:148)
> > at org.basex.data.Data.kind(Data.java:306)
> > at org.basex.query.value.node.DBNode.(DBNode.java:51)
> > at org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:68)
> > at org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:22)
> > at org.basex.query.value.seq.Seq$1.next(Seq.java:77)
> > at org.basex.query.expr.path.IterPath$1.next(IterPath.java:58)
> > at org.basex.query.expr.path.IterPath$1.next(IterPath.java:36)
> > at org.basex.query.MainModule$1.next(MainModule.java:114)
> > at org.basex.query.func.StandardFunc.cache(StandardFunc.java:384)
> > at
> org.basex.query.func.xquery.XQueryEval.eval(XQueryEval.java:129)
> > at
> org.basex.query.func.xquery.XQueryEval.eval(XQueryEval.java:59)
> > at
> org.basex.query.func.xquery.XQueryEval.value(XQueryEval.java:49)
> > at org.basex.query.expr.gflwor.GFLWOR.value(GFLWOR.java:77)
> > at org.basex.query.QueryContext.value(QueryContext.java:421)
> > at org.basex.query.expr.gflwor.Let$LetEval.next(Let.java:187)
> > at org.basex.query.expr.gflwor.GFLWOR$1.next(GFLWOR.java:95)
> > at org.basex.query.MainModule$1.next(MainModule.java:114)
> > at org.basex.core.cmd.AQuery.query(AQuery.java:91)
> > at org.basex.core.cmd.XQuery.run(XQuery.java:22)
> > at org.basex.core.Command.run(Command.java:398)
> > at org.basex.http.rest.RESTCmd.run(RESTCmd.java:99)
> > at org.basex.http.rest.RESTQuery.query(RESTQuery.java:74)
> > at org.basex.http.rest.RESTRun.run0(RESTRun.java:41)
> > at org.basex.http.rest.RESTCmd.run(RESTCmd.java:65)
> > at org.basex.core.Command.run(Command.java:398)
> > at org.basex.core.Command.execute(Command.java:100)
> > at org.basex.core.Command.execute(Command.java:123)
> > at org.basex.http.rest.RESTServlet.run(RESTServlet.java:22)
> > at org.basex.http.BaseXServlet.service(BaseXServlet.java:64)
> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
> > at
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
> > at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
> > at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> > at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> > at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> > at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
> > at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:

[basex-talk] Runtime Exception

2016-04-25 Thread Mansi Sheth

Hello,

My current BaseXDB is at 920GB, with ~230 databases... I run jetty server
visa basexhttp script with giving it explicit 30GB of RAM. While trying to
access a query, thru REST api via XQUERY, I get below error.

HTTP/1.1 400 Bad Request^M
Content-Type: text/plain;charset=UTF-8^M
Content-Length: 4207^M
Server: Jetty(8.1.16.v20140903)^M
^M
Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 8.2.3
Java: Oracle Corporation, 1.7.0_95
OS: Linux, amd64
Stack Trace:
java.lang.RuntimeException: *Data Access out of bounds:*
*- pre value: 126882320*
*- #used blocks: 495643*
*- #total locks: 495643*
*- access: 495642 (495643 > 495642]*
at org.basex.util.Util.notExpected(Util.java:60)
at
org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:458)
at
org.basex.io.random.TableDiskAccess.read1(TableDiskAccess.java:148)
at org.basex.data.Data.kind(Data.java:306)
at org.basex.query.value.node.DBNode.(DBNode.java:51)
at org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:68)
at org.basex.query.value.seq.DBNodeSeq.itemAt(DBNodeSeq.java:22)
at org.basex.query.value.seq.Seq$1.next(Seq.java:77)
at org.basex.query.expr.path.IterPath$1.next(IterPath.java:58)
at org.basex.query.expr.path.IterPath$1.next(IterPath.java:36)
at org.basex.query.MainModule$1.next(MainModule.java:114)
at org.basex.query.func.StandardFunc.cache(StandardFunc.java:384)
at org.basex.query.func.xquery.XQueryEval.eval(XQueryEval.java:129)
at org.basex.query.func.xquery.XQueryEval.eval(XQueryEval.java:59)
at org.basex.query.func.xquery.XQueryEval.value(XQueryEval.java:49)
at org.basex.query.expr.gflwor.GFLWOR.value(GFLWOR.java:77)
at org.basex.query.QueryContext.value(QueryContext.java:421)
at org.basex.query.expr.gflwor.Let$LetEval.next(Let.java:187)
at org.basex.query.expr.gflwor.GFLWOR$1.next(GFLWOR.java:95)
at org.basex.query.MainModule$1.next(MainModule.java:114)
at org.basex.core.cmd.AQuery.query(AQuery.java:91)
at org.basex.core.cmd.XQuery.run(XQuery.java:22)
at org.basex.core.Command.run(Command.java:398)
at org.basex.http.rest.RESTCmd.run(RESTCmd.java:99)
at org.basex.http.rest.RESTQuery.query(RESTQuery.java:74)
at org.basex.http.rest.RESTRun.run0(RESTRun.java:41)
at org.basex.http.rest.RESTCmd.run(RESTCmd.java:65)
at org.basex.core.Command.run(Command.java:398)
at org.basex.core.Command.execute(Command.java:100)
at org.basex.core.Command.execute(Command.java:123)
at org.basex.http.rest.RESTServlet.run(RESTServlet.java:22)
at org.basex.http.BaseXServlet.service(BaseXServlet.java:64)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)


-- 
- Mansi

[basex-talk] XQERY Help

2016-01-13 Thread Mansi Sheth

Hello,

I need help with hopefully a simple XQUERY.

I want to extend below XQUERY to run not on all documents in all databases,
but only on those documents which contains "input string X" in its db:path
(i.e. its original file name).

Any help appreciated.

declare variable $n as xs:string external; (: command line query, to be
entered as "n" variable :)
declare option output:item-separator ""; (: Each element would be in
new line :)

(: Run input query, on every XML document in every Database:)
let $queryData :=
for $db in db:list()
(: Assign dynamic variables to generate query, to be used in eval :)
let $query := "declare variable $db external; " || "db:open($db)" || $n
return xquery:eval($query,map { 'db': $db, 'query': $n })

return distinct-values($queryData)

- Mansi

Re: [basex-talk] Guidance on Indexing

2016-01-03 Thread Mansi Sheth

Thanks Christian as always was a quick and detailed response.

1. I am not 100% clear, if you are motivating me towards or against
FULLTEXT indexing :)

2. Yes I am dealing with GBs of XML files. I create new Databases, using
JAVA API using CreateDB class. Should I be using MainOptions to set
AUTOOPTIMIZE and UPDINDEX options before each new db creation ? In
MainOptions class, I didn't find any auto optimize option, am I missing
something ? Since, I am anyways setting options thru this method, should I
also set FTINDEX or ATTRINDEX (based on your response 1) attribute as well,
before creating each DB ? I would hate to run optimization script after
each DB update (updates happens daily).

Please advice,
- Mansi

On Sun, Jan 3, 2016 at 4:52 PM, Christian Grün 
wrote:

> Hi Mansi,
>
> > 1. Most of my xqueries are of below nature
> >
> > '/Archives/descendant::apiCalls[contains(@name,"com.sun")]/@name', where
> > apiCalls could be 3-4 level under 'Archives'. Xqueries are accessed via
> REST
>
> The existing index structures won’t allow you to look for arbitrary
> sub strings; see [1] for more information.
>
> You are right, the full-text index may be a possibly way out. Prefix
> searches can be realized via the "using wildcards" option [2]:
>
>   //*[text() contains text "abc.*" using wildcards
>
> Please note that the query string will always be "tokenized": if you
> are looking for "com.sun", you will also get results like "COM SUN!".
>
> > 2. I have 1000s of documents, spanning over 100 XML DB, with total space
> > around 400 GB currently. Each query is taking roughly 30 mins, to run.
> >
> > My concern is, at each DB update, I am using attribute indexing, but info
> > command on basex prompt tells me otherwise. Am I misreading something ?
> Is
> > there a way to fix this once DB is created ? Its takes me 48 hours, to
> > create DBs from scratch... :)
>
> If UPDINDEX and AUTOOPTIMIZE is false, you will need to call
> "OPTIMIZE" after your updates.
>
> If you create a new database, you can set UPDINDEX and AUTOOPTIMIZE to
> true. However, AUTOOPTIMIZE will get incredibly slow if you are
> working with gigabytes of XML data.
>
> > Reading thru UPDINDEX and AUTOOPTIMIZE ALL commands, tells me to open
> each
> > DB and run these commands. Is that my option ? Do we have a xquery script
> > somewhere which I can use to do this ?
>
> If your databases are called "db1" ... "db100", the following XQuery
> script will optimize all those databases:
>
>   for $i in 1 to 100
>   return db:optimize('db' || $i)
>
> You can also create a command script [3] with XQuery:
>
>   {
> for $i in 1 to 100
> return (
>   { 'db' || $i },
>   
> )
>   }
>
> You can store the result as a .bxs file and run it afterwards.
>
> Before you create all index structures, you should probably run your
> queries on some smaller database instances and check out the "Query
> Info" panel in the GUI. It will tell you if an index is used or not.
>
> Best,
> Christian
>
> [1] http://docs.basex.org/wiki/Indexes#Value_Indexes
> [2] http://docs.basex.org/wiki/Full-Text#Match_Options
> [3] http://docs.basex.org/wiki/Commands#Command_Scripts
>



-- 
- Mansi

[basex-talk] Guidance on Indexing

2016-01-03 Thread Mansi Sheth

Hello,

A very happy new year to all of you !!!

I have some very basic questions with indexing.

1. Most of my xqueries are of below nature

'/Archives/descendant::apiCalls[contains(@name,"com.sun")]/@name', where
apiCalls could be 3-4 level under 'Archives'. Xqueries are accessed via REST

Based on this, I used attribute indexing, after each update to DB. Am I
correct ? Should I have been using fulltext indexing instead ?  Why ?

2. I have 1000s of documents, spanning over 100 XML DB, with total space
around 400 GB currently. Each query is taking roughly 30 mins, to run.
Though expectable performance, but I know I can do better with indexing.
Currently, when I looked at one of the DBs,

> open bi_output_3
Database 'bi_output_3' was opened in 38.22 ms.
> info db
Database Properties
 Name: bi_output_3
 Size: 3938 MB
 Nodes: 16193129
 Documents: 35
 Binaries: 0
 Timestamp: 2016-01-03T13:40:40.000Z

Resource Properties
 Timestamp: 2016-01-03T13:40:40.776Z
 Encoding: UTF-8
 CHOP: true

Indexes
 Up-to-date: false
 TEXTINDEX: false
 ATTRINDEX: false
 FTINDEX: false
 LANGUAGE: English
 STEMMING: false
 CASESENS: false
 DIACRITICS: false
 STOPWORDS:
 UPDINDEX: false
 AUTOOPTIMIZE: false
 MAXCATS: 100
 MAXLEN: 96

When looked at its HDD footprint:

ubuntu@/BaseXDB/bi_output_3$ ls -l
total 4032992
-rw-rw-r-- 1 ubuntu ubuntu 2209449064 Jan  1 17:00 atv.basex
-rw-rw-r-- 1 ubuntu ubuntu  4 Jan  1 16:35 atvl.basex
-rw-rw-r-- 1 ubuntu ubuntu  0 Jan  1 16:35 atvr.basex
-rw-rw-r-- 1 ubuntu ubuntu   6414 Jan  3 13:40 doc.basex
-rw-rw-r-- 1 ubuntu ubuntu  6 Jan  1 17:00 ftxx.basex
-rw-rw-r-- 1 ubuntu ubuntu  0 Jan  1 17:00 ftxy.basex
-rw-rw-r-- 1 ubuntu ubuntu  0 Jan  1 17:00 ftxz.basex
-rw-rw-r-- 1 ubuntu ubuntu829 Jan  3 13:40 inf.basex
-rw-rw-r-- 1 ubuntu ubuntu 28 Jan  1 17:00 swl.basex
-rw-rw-r-- 1 ubuntu ubuntu 1916444672 Jan  3 13:40 tbl.basex
-rw-rw-r-- 1 ubuntu ubuntu3796037 Jan  3 13:40 tbli.basex
-rw-rw-r-- 1 ubuntu ubuntu  45462 Jan  1 17:00 txt.basex
-rw-rw-r-- 1 ubuntu ubuntu  4 Jan  1 16:35 txtl.basex
-rw-rw-r-- 1 ubuntu ubuntu  0 Jan  1 16:35 txtr.basex
ubuntu@/BaseXDB/bi_output_3$ pwd
/veracode/msheth/BaseXDB/bi_output_3
ubuntu@/BaseXDB/bi_output_3$

My concern is, at each DB update, I am using attribute indexing, but info
command on basex prompt tells me otherwise. Am I misreading something ? Is
there a way to fix this once DB is created ? Its takes me 48 hours, to
create DBs from scratch... :)

Reading thru UPDINDEX and AUTOOPTIMIZE ALL commands, tells me to open each
DB and run these commands. Is that my option ? Do we have a xquery script
somewhere which I can use to do this ?

Thanks,
- Mansi

Re: [basex-talk] [bxerr:BXDB0002] Too many open files

2015-12-07 Thread Mansi Sheth

Thanks Joe for your input. I haven't tried all the options yet, but will
surely go thru.

I guess, what I was trying to see is, if there is a way I can optimize my
XQUERIES, to close open databases which it not longer needs. Currently, my
queries are something of below nature. I am thinking, if there is a better
way to deal with *bold* piece of code below.

declare variable $n as xs:string external;
declare option output:item-separator "";


let $queryData :=
for $db in db:list()

*let $query := "declare variable $db external; " || "db:open($db)" ||
$n*
*return xquery:eval($query,map { 'db': $db, 'query': $n })*

return distinct-values($queryData)





On Mon, Dec 7, 2015 at 1:30 PM, Joe Wicentowski <joe...@gmail.com> wrote:

> Hi Mansi,
>
> The results of ulimit can be misleading.  See this article - which really
> helped me when I encountered this issue (though not with BaseX):
>
>
> https://underyx.me/2015/05/18/raising-the-maximum-number-of-file-descriptors.html
>
> Joe
>
> On Mon, Dec 7, 2015 at 1:22 PM, Mansi Sheth <mansi.sh...@gmail.com> wrote:
>
>> Thanks Christian,
>>
>> I had already set open files limit on the OS:
>>
>> ubuntu@ip-10-0-0-83:~$ ulimit -Hn
>>
>> 
>>
>> However, I still face exact same problem. Process breaks at the same db
>> count
>>
>> [bxerr:BXDB0002] Resource
>> "/veracode/msheth/BaseXDB/bi_output_715/inf.basex (Too many open files)"
>> not found.
>>
>> On Sat, Dec 5, 2015 at 7:53 AM, Christian Grün <christian.gr...@gmail.com
>> > wrote:
>>
>>> Hi Mansi,
>>>
>>> If you are working with Linux, you may need to increase the maximum
>>> file limit with "ulimit -n" [1].
>>>
>>> Hope this helps,
>>> Christian
>>>
>>> [1] http://www.linuxhowtos.org/Tips%20and%20Tricks/ulimit.htm
>>>
>>>
>>>
>>> On Fri, Dec 4, 2015 at 7:52 PM, Mansi Sheth <mansi.sh...@gmail.com>
>>> wrote:
>>> > Hello,
>>> >
>>> > I am importing BaseX, with tons of XML files. Currently I have roughly
>>> 1600
>>> > databases, I am starting basexhttp service, to access it over a web
>>> service
>>> > endpoint, thru a xquery file. Using BaseX 8.2.3.
>>> >
>>> > I am receiving below error:
>>> >
>>> > [bxerr:BXDB0002] Resource
>>> "/veracode/msheth/BaseXDB/bi_output_713/inf.basex
>>> > (Too many open files)" not found.
>>> >
>>> > basexhttp, is running with 10240M virtual memory.
>>> >
>>> > I can share the xquery file, if thats needed.
>>> >
>>> > Has anyone experienced this before ? Is there a limit on no of
>>> databases
>>> > supported by BaseX ? Is there some configuration option, which I can
>>> use to
>>> > close already queried database ?
>>> >
>>> > Thanks,
>>> > - Mansi
>>>
>>
>>
>>
>> --
>> - Mansi
>>
>
>


-- 
- Mansi

Re: [basex-talk] [bxerr:BXDB0002] Too many open files

2015-12-07 Thread Mansi Sheth

Thanks Christian,

I had already set open files limit on the OS:

ubuntu@ip-10-0-0-83:~$ ulimit -Hn



However, I still face exact same problem. Process breaks at the same db
count

[bxerr:BXDB0002] Resource "/veracode/msheth/BaseXDB/bi_output_715/inf.basex
(Too many open files)" not found.

On Sat, Dec 5, 2015 at 7:53 AM, Christian Grün <christian.gr...@gmail.com>
wrote:

> Hi Mansi,
>
> If you are working with Linux, you may need to increase the maximum
> file limit with "ulimit -n" [1].
>
> Hope this helps,
> Christian
>
> [1] http://www.linuxhowtos.org/Tips%20and%20Tricks/ulimit.htm
>
>
>
> On Fri, Dec 4, 2015 at 7:52 PM, Mansi Sheth <mansi.sh...@gmail.com> wrote:
> > Hello,
> >
> > I am importing BaseX, with tons of XML files. Currently I have roughly
> 1600
> > databases, I am starting basexhttp service, to access it over a web
> service
> > endpoint, thru a xquery file. Using BaseX 8.2.3.
> >
> > I am receiving below error:
> >
> > [bxerr:BXDB0002] Resource
> "/veracode/msheth/BaseXDB/bi_output_713/inf.basex
> > (Too many open files)" not found.
> >
> > basexhttp, is running with 10240M virtual memory.
> >
> > I can share the xquery file, if thats needed.
> >
> > Has anyone experienced this before ? Is there a limit on no of databases
> > supported by BaseX ? Is there some configuration option, which I can use
> to
> > close already queried database ?
> >
> > Thanks,
> > - Mansi
>



-- 
- Mansi

[basex-talk] [bxerr:BXDB0002] Too many open files

2015-12-04 Thread Mansi Sheth

Hello,

I am importing BaseX, with tons of XML files. Currently I have roughly 1600
databases, I am starting basexhttp service, to access it over a web service
endpoint, thru a xquery file. Using BaseX 8.2.3.

I am receiving below error:

[bxerr:BXDB0002] Resource "/veracode/msheth/BaseXDB/bi_output_713/inf.basex
(Too many open files)" not found.

basexhttp, is running with 10240M virtual memory.

I can share the xquery file, if thats needed.

Has anyone experienced this before ? Is there a limit on no of databases
supported by BaseX ? Is there some configuration option, which I can use to
close already queried database ?

Thanks,
- Mansi

[basex-talk] Basex 8.2.3 data not shown, until OS restart

2015-09-01 Thread Mansi Sheth

Hello,

This is something very weird. I import bunch of XML files, into this latest
version of based using Java API. When trying to access this data (via basex
command line client or REST), doesn't show any databases at all. These
databases shows up only after restarting OS.

Is this a known issues ? or just me ?

- Mansi

Re: [basex-talk] Finding document based on filename

2015-09-01 Thread Mansi Sheth

Thanks guys for all expert comments. Currently, I am going experimenting
performance with just deleting and inserting using Java API. If this
process takes a tiny bit longer, i don't really care is what I figured :)
If i becomes unacceptable, I will use one of these suggestions.

Thanks once again.

StringList databases =  List.list(context) ;

String query = "" ;

for(String database : databases ) {

query = "db:list('" + database + "')" ;



try {

for (String fileName: query(query).split(" ")) {

query = "db:delete('" +  database + "','" + fileName + "')" ;

if(fileName.contains(XMLFileName.split("_")[1])) {

query(query) ;

logger.info("Deleted " + fileName + " from " + database) ;

retVal = true;

break;

}

}

} catch (BaseXException e) {

e.printStackTrace();

}

}

On Mon, Aug 31, 2015 at 9:45 PM, Martín Ferrari 
wrote:

> I forgot one thing, I got much better performance by just calling
> replace rather than delete and insert, but this is a db with more than one
> million records. If performance is not important, I believe either way will
> do.
>
> Martín.
>
> --
> From: ferrari_mar...@hotmail.com
> To: mansi.sh...@gmail.com; basex-talk@mailman.uni-konstanz.de
> Date: Mon, 31 Aug 2015 16:35:33 +
> Subject: Re: [basex-talk] Finding document based on filename
>
>
> Hi Mansi,
>  I have a similar situation. I don't think there's a fast way to get
> documents by only knowing a part of their names. It seems you need to know
> the exact name. In my case, we might be able to group documents by a common
> id, so we might create subfolders inside the DB and store/get the contents
> of the subfolder directly, which is pretty fast.
>  I've also tried indexing, but insertions got really slow (I assume
> maybe because indexing is not granular, it indexes all values) and we
> need performance.
>
>  Oh, I've also tried using starts-with() instead of contains(), but it
> seems it does not pick up indexes.
>
> Martín.
>
> --
> Date: Fri, 28 Aug 2015 16:52:37 -0400
> From: mansi.sh...@gmail.com
> To: basex-talk@mailman.uni-konstanz.de
> Subject: [basex-talk] Finding document based on filename
>
> Hello,
>
> I would be having 100s of databases, with each database having 100 XML
> documents. I want to devise an algorithm, where given a part of XML file
> name, i want to know which database(s) contains it, or null if document is
> not currently present in any database. Based on that, add current document
> into the database. This is to always maintain latest version of a document
> in DB, and remove the older version, while adding newer version.
>
> So far, only way I could come up with is:
>
> for $db in all-databases:
>   open $db
>   $fileNames = list $db
> for eachFileName in $fileNames:
>if $eachFileName.contains(sub-xml filename):
> add to ret-list-db
>
> return ret-list-db
>
> Above algorithm, seems highly inefficient, Is there any indexing, which
> can be done ? Do you suggest, for each document insert, I should maintain a
> separate XML document, which lists each file inserted etc.
>
> Once, i get hold of above list of db, I would be eventually deleting that
> file and inserting a latest version of that file(which would have same
> sub-xml file name). So, constant updating of this external document also
> seems painful (Map be ?).
>
> Also, would it be faster, using XQUERY script files, thru java code, or
> using Java API for such operations ?
>
> How do you all deal with such operations ?
>
> - Mansi
>



-- 
- Mansi

[basex-talk] Finding document based on filename

2015-08-28 Thread Mansi Sheth

Hello,

I would be having 100s of databases, with each database having 100 XML
documents. I want to devise an algorithm, where given a part of XML file
name, i want to know which database(s) contains it, or null if document is
not currently present in any database. Based on that, add current document
into the database. This is to always maintain latest version of a document
in DB, and remove the older version, while adding newer version.

So far, only way I could come up with is:

for $db in all-databases:
  open $db
  $fileNames = list $db
for eachFileName in $fileNames:
   if $eachFileName.contains(sub-xml filename):
add to ret-list-db

return ret-list-db

Above algorithm, seems highly inefficient, Is there any indexing, which can
be done ? Do you suggest, for each document insert, I should maintain a
separate XML document, which lists each file inserted etc.

Once, i get hold of above list of db, I would be eventually deleting that
file and inserting a latest version of that file(which would have same
sub-xml file name). So, constant updating of this external document also
seems painful (Map be ?).

Also, would it be faster, using XQUERY script files, thru java code, or
using Java API for such operations ?

How do you all deal with such operations ?

- Mansi

Re: [basex-talk] XQuery Optimization suggestions

2015-01-20 Thread Mansi Sheth

As part of preparation of presenting at XML Prague, I am working on a slide
showing statistics. From below comments, I started thinking, would it be
best to show time taken against size of the DB or against no of nodes. What
do you all think ? If I am thinking it from no of nodes basis, would it be
a little better comparison with other tools ? For e.g.

1 million records in SQL database ~= 1 million nodes in BaseX, thus making
closer to apples to apples comparison for time taken.

We are currently, battling with this at work too. There are few different
approaches for data mining, for different data sources. I talk in terms of
GBs of data in database and SQL fans, talk in terms of millions of records.
Its hard to make any progress and push for NXDs.

- Mansi

On Sun, Jan 18, 2015 at 11:24 AM, Christian Grün christian.gr...@gmail.com
wrote:

Just finished processing 310GB of data, with result set worth 11 million
records within 44 minutes. I am currently psyched with the potential of
even
BaseX supporting this kind of data. But I am no expert here.

What are your views on this performance statistics ?

My assumption is that it basically boils down to a sequential scan of
most of the elements in the database (so buying faster SSDs will
probably be the safest choice to speed up your queries..). 310 GB is a
lot, so 44 minutes is probably not that bad. Speaking for myself,
though, I was sometimes surprised that other NoSQL systems I tried
were not really faster than BaseX, if you have hierarchical data
structures, and if you need to post-process large amounts of data.

However, as your queries look pretty simple, you could also have a
look at e.g. MongoDB or RethinkDB (provided that the data can be
converted to JSON). Those systems give you convenient Big Data
features like distribution/sharding or replication.

But I'm also interested what others say about this.
Christian

- Mansi

On Sun, Jan 18, 2015 at 10:49 AM, Christian Grün
christian.gr...@gmail.com
wrote:

Hi Mansi,

http://localhost:8984/rest?run=get_query.xqn=/Archives/*/descendant::c/descendant::a[contains(@name,
xyz)]/@name/data()

My guess is that most time is spent to parse all the nodes in the
database. If you know more about the database structure, you could
replace some of the descendant with explicit child steps. Apart from
that, I guess I'm repeating myself, but have you tried to remove
duplicates in XQuery, or do grouping and sorting in the language?
Usually, it's recommendable to do as much as possible in XQuery itself
(although it might not be obvious how to do this at first glance).

Christian

--
- Mansi

[basex-talk] XQuery Optimization suggestions

2015-01-18 Thread Mansi Sheth

Hello,

I am doing some performance analysis on size of XML files in DB, no of
records in a result set and how much time it takes to get me results.

Currently, I have 150GB worth of XML documents imported into BaseXDB. It
took roughly 21 minutes to return back result set worth 5.3 million records.

Queries are of below form:

http://localhost:8984/rest?run=get_query.xqn=/Archives/*/descendant::c/descendant::a[contains(@name,
xyz)]/@name/data()

XQUERY File:
for $db in db:list()
(: Assign dynamic variables to generate query, to be used in eval :)
let $query := declare variable $db external;  || db:open($db)
|| $n
return xquery:eval($query,map { 'db': $db, 'query': $n })


I have been few questions around this.

1. I have been routinely advices on this email chain, to avoid
serialization from XPATH and let xquery handle it. I tried a few things
like replacing strings() with data(), adding serialization option on REST
call, in XQUERY file etc. But, I don't see any performance gain. Is there
something else I can try or something, I am doing wrong ?

2. Does anyone have any resource to compare this performance to other NoSQL
databases. I am just very curious, how above performance numbers compares
to other DBs ?

- Mansi

Re: [basex-talk] XQuery Optimization suggestions

2015-01-18 Thread Mansi Sheth

Structure of data is nested, so I have to write queries this way
unfortunately. Also, I am doing performance analysis removing all external
parameters like any kind of post-processing, network latency etc. Just
isolating if I can do any better. So, guess this is the best I can do... No
problem at all.

Just finished processing 310GB of data, with result set worth 11 million
records within 44 minutes. I am currently psyched with the potential of
even BaseX supporting this kind of data. But I am no expert here.

What are your views on this performance statistics  ?

- Mansi

On Sun, Jan 18, 2015 at 10:49 AM, Christian Grün christian.gr...@gmail.com
wrote:

 Hi Mansi,

 
 http://localhost:8984/rest?run=get_query.xqn=/Archives/*/descendant::c/descendant::a[contains(@name,
 xyz)]/@name/data()

 My guess is that most time is spent to parse all the nodes in the
 database. If you know more about the database structure, you could
 replace some of the descendant with explicit child steps. Apart from
 that, I guess I'm repeating myself, but have you tried to remove
 duplicates in XQuery, or do grouping and sorting in the language?
 Usually, it's recommendable to do as much as possible in XQuery itself
 (although it might not be obvious how to do this at first glance).

 Christian




-- 
- Mansi

Re: [basex-talk] Silly XQUERY exception

2015-01-08 Thread Mansi Sheth

Lukas,

That was it!! I feel like shooting myself. What an oversight.

Thanks a ton for looking at it and spotting it.

- Mansi

On Thu, Jan 8, 2015 at 2:44 AM, Lukas Kircher lukaskirch...@gmail.com
wrote:

 Hi Mansi,

 let $cmd := /A/*/descendant::C/*descandant*::*[contains(@name,'|| $n
 ||')]”


 Just a quick scan - I marked the problem in bold above - I would try
 ‘desc*e*ndant’
 instead of 'desc*a*ndant’.

 Cheers,
 Lukas




-- 
- Mansi

[basex-talk] Silly XQUERY exception

2015-01-07 Thread Mansi Sheth

Hello,

I feel very stupid and frustrated, not able to fix this error:

Below, is a my query code, which I am trying to run. I am passing the value
for contains clause thru command line and I expect to receive # of xml
files matching $cmd xpath. I always get

Stopped at /veracode/msheth/BaseXWeb/get_prevalence.xq, 16/12:
*[XPST0003] Expecting ':=', found ':'.*

When ran thru REST:

*[XPST0003] Expecting valid step, found 'd'.*

I think, its the way $cmd is being set. I have tried, simple string concat
using ||, using concat command, using html entities etc.

declare variable $n as xs:string external;
declare option output:item-separator #xa;;

(:  let $cmd :=
concat(/A/*/descendant::C/descandant::*[contains(@name,,$singlequote,$n,$singlequote,)])
:)
*let $cmd := /A/*/descendant::C/descandant::*[contains(@name,'|| $n
||')]*

let $aPath :=
for $db in db:list()
let $query :=
  declare variable $db external;  ||
  db:open($db) || $cmd
*return xquery:eval($query,*
*map { 'db': $db, 'query': $cmd })*

let $clients :=
for $elem in $aPath
return db:path($elem)

return $n , distinct-values(count($clients))

Lines of code, which are the culprit, are marked in bold above. Any and all
suggestions are greatly appreciated.

- Mansi

[basex-talk] Design finding almost duplicate xml files

2015-01-02 Thread Mansi Sheth

Hello,

I am trying to come up with a design, which just before insert a xml file
into database, will warn us, that there is almost an identical xml file
(with different name and different size) already stored in the database.

Almost identical would be based on few elements of xml file such as:

root
A name=
 B name=
   C name=/
   C name=/
   .
   .
 /B
/A
.
.
.
/root

A and B from above snippet but different C. Element A could be
repeated 100s of time in single xml file.

Any pointers ?

- Mansi

Re: [basex-talk] Out Of Memory

2014-12-30 Thread Mansi Sheth

Hello,

Wanted to get back to this email chain and share my experience.

I got this running beautifully (including all post processing of results),
using the below command:

curl -ig '
http://localhost:8984/rest?run=get_query.xqn=/Archives/*/descendant::D/@name/string()'
| cut -d: -f1 | cut -d. -f1-3 | sort | uniq -c | sort -n -r

I am using Basex 8.0 beta 763cc93 build. Running this on i7 2.7GHZ MBP,
giving 8GB to basexhttp process. it took around 34 min on a 41 GB data. I
think, lot of time went in post processing (sorting) the result set, rather
than actually extracting the results from BaseX DB.

When tried a similar query on a much smaller database(3GB) on a much
powerful amazon instance, giving 20GB RAM to basex http process, got me
results with post processing within 4 mins.

Thanks for all your inputs guys,

Keep BaseXing... !!!
- Mansi

On Fri, Nov 7, 2014 at 12:25 PM, Mansi Sheth mansi.sh...@gmail.com wrote:

This email chain, is extremely helpful. Thanks a ton guys. Certainly one
of the most helpful folks here :)

I have to try a lot of these suggestions but currently I am being pulled
into something else, so I have to pause for the time being.

Will get back to this email thread, after trying a few things and my
relevant observations.

- Mansi

On Fri, Nov 7, 2014 at 3:48 AM, Fabrice Etanchaud fetanch...@questel.com
wrote:

Hi Mansi,

From what I can see,

for each pqr value, you could use db:attribute-range to retrieve all the
file names, group by/count to obtain statistics.

You could also create a new collection from an extraction of only the
data you need, changing @name into element and use full text fuzzy match.

Hoping it helps

Cordialement

Fabrice

*De :* basex-talk-boun...@mailman.uni-konstanz.de [mailto:
basex-talk-boun...@mailman.uni-konstanz.de] *De la part de* Mansi Sheth
*Envoyé :* jeudi 6 novembre 2014 20:55
*À :* Christian Grün
*Cc :* BaseX
*Objet :* Re: [basex-talk] Out Of Memory

I would be doing tons of post processing. I never use UI. I either use
REST thru cURL or command line.

I would basically need data in below format:

XML File Name, @name

I am trying to whitelist picking up values for only
starts-with(@name,pqr). where pqr is a list of 150 odd values.

My file names, are essentially some ID/keys, which I would need to map it
further using sqlite to some values and may be group by it.. etc.

So, basically I am trying to visualize some data, based on its existence
in which xml files. So, yes count(query) would be fine, but won't solve
much purpose, since I still need value pqr.

- Mansi

On Thu, Nov 6, 2014 at 11:19 AM, Christian Grün
christian.gr...@gmail.com wrote:

Query: /A/*//E/@name/string()

In the GUI, all results will be cached, so you could think about
switching to command line.

Do you really need to output all results, or do you do some further
processing with the intermediate results?

For example, the query count(/A/*//E/@name/string()) will probably
run without getting stuck.

This query, was going OOM, within few mins.

I tried a few ways, of whitelisting, with contain clause, to truncate
the
result set. That didn't help too. So, now I am out of ideas. This is
giving
JVM 10GB of dedicated memory.

Once, above query works and doesn't go Out Of Memory, I also need
corresponding file names too:

XYZ.xml //E/@name
PQR.xml //E/@name

Let me know if you would need more details, to appreciate the issue ?
- Mansi

On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün
christian.gr...@gmail.com
wrote:

Hi Mansi,

I think we need more information on the queries that are causing the
problems.

Best,
Christian

On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.com
wrote:
Hello,

I have a use case, where I have to extract lots in information from
each
XML
in each DB. Something like, attribute values of most of the nodes in
an
XML.
For such, queries based goes Out Of Memory with below exception. I am
giving
it ~12GB of RAM on i7 processor. Well I can't complain here since I
am
most
definitely asking for loads of data, but is there any way I can get
these
kinds of data successfully ?

mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp
BaseX 8.0 beta b45c1e2 [Server]
Server was started (port: 1984)
HTTP Server was started (port: 8984)
Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError:
Java
heap
space
at

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
at

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
at

org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
at

org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll

[basex-talk] data protection at rest

2014-12-29 Thread Mansi Sheth

Hello,

I am thinking about my options of protecting BaseXDB's data at rest. We are
storing sensitive client data which we need to protect. We would be
migrating BaseXDB on an amazon instance, so protection at rest would be a
primary concern.

What I wish to do is, while inserting data into the database, it should be
encrypted and while using XQUERY, we should have a mechanism to decrypt the
data and retrieve the information needed. I am aware of the performance hit
here. Would evaluate if its acceptable after I could collect some
statistics.

I looked at the docs:
http://docs.basex.org/wiki/Cryptographic_Module#Encryption_.26_Decryption

But, I didn't completely understand a use case for this example. Or if it
would solve my purpose. I am currently using some Java code to insert files
into the database.

Has anyone done something on this line ? Please share some use cases.

- Mansi

Re: [basex-talk] Distributed processing on roadmap ?

2014-11-20 Thread Mansi Sheth

Sorry about the delay. I was busy preparing a presentation for my company
as baseX being a our analytics solution. It was very well received. All
thanks to you and everyone on this user list :)

Based on my use cases, I believe (again I am no expert in this domain),
map/reduce approach would work better. The result set being returned would
contain maximum couple of thousand records with some post-processing on it,
as compared to TBs of data being queried. If the querying and processing
step could use processing power from clusters of nodes, may be we might get
significant performance gain ? What are your thoughts ? What are other use
cases, you come across ?

- Mansi

On Mon, Nov 17, 2014 at 10:50 AM, Christian Grün christian.gr...@gmail.com
wrote:

 Hi Mansi,

 it's nice to hear that you have been successfully scaling your
 database instances so far.

  I love using BaseX and the powers of BaseX. Currently I am able to query
 ~60GB of XML files under 2.5 mins. I still have a few more optimization a
 to try. I also do see this data increasing to a couple of TB shortly.
 
  I would love to see if this kind of processing is almost real time
 (within a min). So my question is there any discussions around supporting
 distributed processing or clusters of nodes etc ?

 Yes, distributed processing is a frequently discussed topic. One of
 our major questions is what challenge to solve first. As you surely
 know, there are so many different NoSQL stores out there, and all of
 them tackle different problems. Up to now, we spent most time on
 replication, but this would not give you better performance.

 So I would be interested to hear what kind of distribution techniques
 you believe would give you better performance. Do you think that a
 map/reduce approach would be helpful, or do you simply have lots of
 data that somehow needs to be sent to a client as quickly as possible?
 In other words, how large are your results sets? Do you really need
 the complete results, or would you rather like to draw some
 conclusions from the scanned data?

 Back to the current technology… Maybe you could do some Java profiling
 (using e.g. -Xrunhprof:cpu=samples) in order to find out what's the
 current bottleneck.

 Best,
 Christian




-- 
- Mansi

Re: [basex-talk] Out Of Memory

2014-11-07 Thread Mansi Sheth

This email chain, is extremely helpful. Thanks a ton guys. Certainly one of
the most helpful folks here :)

I have to try a lot of these suggestions but currently I am being pulled
into something else, so I have to pause for the time being.

Will get back to this email thread, after trying a few things and my
relevant observations.

- Mansi

On Fri, Nov 7, 2014 at 3:48 AM, Fabrice Etanchaud fetanch...@questel.com
wrote:

Hi Mansi,

From what I can see,

for each pqr value, you could use db:attribute-range to retrieve all the
file names, group by/count to obtain statistics.

You could also create a new collection from an extraction of only the data
you need, changing @name into element and use full text fuzzy match.

Hoping it helps

Cordialement

Fabrice

I would be doing tons of post processing. I never use UI. I either use
REST thru cURL or command line.

I would basically need data in below format:

XML File Name, @name

I am trying to whitelist picking up values for only
starts-with(@name,pqr). where pqr is a list of 150 odd values.

My file names, are essentially some ID/keys, which I would need to map it
further using sqlite to some values and may be group by it.. etc.

So, basically I am trying to visualize some data, based on its existence
in which xml files. So, yes count(query) would be fine, but won't solve
much purpose, since I still need value pqr.

- Mansi

On Thu, Nov 6, 2014 at 11:19 AM, Christian Grün christian.gr...@gmail.com
wrote:

Query: /A/*//E/@name/string()

In the GUI, all results will be cached, so you could think about
switching to command line.

Do you really need to output all results, or do you do some further
processing with the intermediate results?

For example, the query count(/A/*//E/@name/string()) will probably
run without getting stuck.

This query, was going OOM, within few mins.

I tried a few ways, of whitelisting, with contain clause, to truncate the
result set. That didn't help too. So, now I am out of ideas. This is
giving
JVM 10GB of dedicated memory.

Once, above query works and doesn't go Out Of Memory, I also need
corresponding file names too:

XYZ.xml //E/@name
PQR.xml //E/@name

Let me know if you would need more details, to appreciate the issue ?
- Mansi

On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün
christian.gr...@gmail.com
wrote:

Hi Mansi,

I think we need more information on the queries that are causing the
problems.

Best,
Christian

On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.com
wrote:
Hello,

I have a use case, where I have to extract lots in information from
each
XML
in each DB. Something like, attribute values of most of the nodes in
an
XML.
For such, queries based goes Out Of Memory with below exception. I am
giving
it ~12GB of RAM on i7 processor. Well I can't complain here since I am
most
definitely asking for loads of data, but is there any way I can get
these
kinds of data successfully ?

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
at

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
at

org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
at

org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
at

org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
at

org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:744)

--
- Mansi

- Mansi

--
- Mansi

Re: [basex-talk] Dynamic Evaluation of XQUERY

2014-11-07 Thread Mansi Sheth

Christian,

I am running out of ideas in debugging this. When I directly execute this
query within XQUERY file, its working perfectly. Just when I pass it thru
command line, its breaking.

Infact the actual .xq file also doesn't matter, as you pointed out, parsing
from command line is broken. I tried -d switch and escaping spaces, but
didn't help. Also, I tested, this is a valid XPATH query.

Please pardon my XQUERY knowledge, its really not my background.

- Mansi



On Thu, Nov 6, 2014 at 8:45 AM, Christian Grün christian.gr...@gmail.com
wrote:

 Hi Mansi,

  ~/Downloads/basex/bin/basex -bn='/Archives/*//class[contains(@name,abc)
  and contains(@name,pqr)]' get_paths.xq
  Stopped at /Users/mansiadmin/Documents/Research-Projects/BigData, 1/4:
  [XPDY0002] and: no context value bound.

 It seems that and was interpreted as XPath step, so it seems as if
 something went wrong when parsing your query on command line (I doubt
 that it's something specific to BaseX).

 Maybe you can simply try to output the query that causes the error,
 instead of trying to evaluate it?

 Christian


 
  However, below query works as a charm:
 
  ~/Downloads/basex/bin/basex
 -bn='/Archives/*//class[contains(@name,abc)]'
  get_paths.xq
 
  I am hoping, for first query above, its some syntactic issue at my end.
 But,
  couldn't fix it, so thought should point out. Please advise.
 
  Code:
 
  declare variable $n as xs:string external;
  declare option output:item-separator #xa;;
 
  let $aPath :=
  for $db in db:list()
  let $query :=
declare variable $db external;  ||
db:open($db) || $n
  return xquery:eval($query,
map { 'db': $db, 'query': $n })
 
  let $paths :=
  for $elem in $aPath
  return db:path($elem)
 
  return distinct-values($paths)
 
  On Mon, Nov 3, 2014 at 6:48 PM, Christian Grün 
 christian.gr...@gmail.com
  wrote:
 
  …in the meanwhile, could you please check if the bug has possibly been
  fixed in the latest 8.0 snapshot [1]?
 
  [1] http://files.basex.org/releases/latest
 
 
  On Tue, Nov 4, 2014 at 12:46 AM, Christian Grün
  christian.gr...@gmail.com wrote:
   Improper use? Potential bug? Your feedback is welcome:
  
   Sounds like a little bug indeed; I will check it tomorrow!
  
  
   Contact: basex-talk@mailman.uni-konstanz.de
   Version: BaseX 7.9
   Java: Oracle Corporation, 1.7.0_45
   OS: Mac OS X, x86_64
   Stack Trace:
   java.lang.NullPointerException
   at org.basex.query.value.item.Str.get(Str.java:49)
   at org.basex.query.func.FNDb.path(FNDb.java:489)
   at org.basex.query.func.FNDb.item(FNDb.java:128)
   at org.basex.query.expr.ParseExpr.iter(ParseExpr.java:45)
   at org.basex.query.func.FNDb.iter(FNDb.java:92)
   at org.basex.query.gflwor.GFLWOR$2.next(GFLWOR.java:78)
   at org.basex.query.MainModule$1.next(MainModule.java:98)
   at org.basex.core.cmd.AQuery.query(AQuery.java:91)
   at org.basex.core.cmd.XQuery.run(XQuery.java:22)
   at org.basex.core.Command.run(Command.java:329)
   at org.basex.core.Command.execute(Command.java:94)
   at org.basex.server.LocalSession.execute(LocalSession.java:121)
   at org.basex.server.Session.execute(Session.java:37)
   at org.basex.core.CLI.execute(CLI.java:106)
   at org.basex.BaseX.init(BaseX.java:123)
   at org.basex.BaseX.main(BaseX.java:42)
  
  
   On Thu, Oct 30, 2014 at 5:54 AM, Christian Grün
   christian.gr...@gmail.com
   wrote:
  
   Hi Mansi,
  
   you have been close! It could work with the following query (I
 haven't
   tried it out, though):
  
   _ get_query_result.xq 
  
   declare variable $n external;
   declare option output:item-separator #xa;;
  
   let $aList :=
 for $name in db:list()
 let $db := db:open($name)
 return xquery:eval($n, map { '': $db })
  
   return distinct-values($aList)
   __
  
   In this code, I'm opening the database in the main loop, and I then
   bind it to the empty string. This way, the database will be the
   context of the query to be evaluated query, and you won't have to
 deal
   with bugs that arise from the concatenation of db:open and the
 query
   string.
  
1. Can we assign dynamic values as a value to a map's key ?
2. Can I map have more than one key, in query:eval ?
  
   This is both possible. As you see in the following query, you'll
 again
   have to declare the variables that you want to bind. I agree this
   causes a lot of code, so we may simplify it again in a future
 version
   of BaseX:
   __
  
   let $n := /a/b/c
   for $db in db:list()
   let $query :=
 declare variable $db external;  ||
 db:open($db) || $n
   return xquery:eval($query,
 map { 'db': $db, 'query': $n })
   __
  
   Best,
   Christian
  
  
  
  
   --
   - Mansi
 
 
 
 
  --
  - Mansi




-- 
- Mansi

Re: [basex-talk] Out Of Memory

2014-11-06 Thread Mansi Sheth

This would need a lot of details, so bear with me below:

Briefly my XML files look like:

A name=
B name=
   C name=
D name=
 E name=/

A can contain B, C or D and B, C or D can contain E. We have 1000s
(currently 3000 in my test data set) of such xml files, of size 50MB on an
average. Its tons of data ! Currently, my database is of ~18GB in size.

Query: /A/*//E/@name/string()

This query, was going OOM, within few mins.

I tried a few ways, of whitelisting, with contain clause, to truncate the
result set. That didn't help too. So, now I am out of ideas. This is giving
JVM 10GB of dedicated memory.

Once, above query works and doesn't go Out Of Memory, I also need
corresponding file names too:

XYZ.xml //E/@name
PQR.xml //E/@name

Let me know if you would need more details, to appreciate the issue ?
- Mansi

On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün christian.gr...@gmail.com
wrote:

 Hi Mansi,

 I think we need more information on the queries that are causing the
 problems.

 Best,
 Christian



 On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.com wrote:
  Hello,
 
  I have a use case, where I have to extract lots in information from each
 XML
  in each DB. Something like, attribute values of most of the nodes in an
 XML.
  For such, queries based goes Out Of Memory with below exception. I am
 giving
  it ~12GB of RAM on i7 processor. Well I can't complain here since I am
 most
  definitely asking for loads of data, but is there any way I can get these
  kinds of data successfully ?
 
  mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp
  BaseX 8.0 beta b45c1e2 [Server]
  Server was started (port: 1984)
  HTTP Server was started (port: 8984)
  Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError: Java
 heap
  space
  at
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
  at
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
  at
 
 org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
  at java.lang.Thread.run(Thread.java:744)
 
 
  --
  - Mansi




-- 
- Mansi

Re: [basex-talk] Out Of Memory

2014-11-06 Thread Mansi Sheth

Interesting idea, I thought of using db partition, but didn't pursue it
further, mainly due to below thought process.

Currently, I am ingesting ~3000 xml files, storing ~50 xml files per db,
which would be growing quickly. So, below approach would lead to ~3000 more
files (which would be increasing), increasing I/O operations considerably
for further pre-processing.

However, I don't really care if process takes few minutes to few hours (as
long as its not day(s) ;)). Given the situation and my options, I would
surely try this.

Database, is currently indexed at attribute level, as thats what I would be
querying the most. Do you think, I should do anything differently ?

Thanks,
- Mansi

On Thu, Nov 6, 2014 at 10:48 AM, Fabrice Etanchaud fetanch...@questel.com
wrote:

  Hi Mansi,



 Here you have a natural partition of your data : the files you ingested.

 So my first suggestion would be to query your data on a file basis:



 for $doc in db:open(‘your_collection_name’)

 let $file-name := db:path($doc)

 return

 file:write(

 $file-name,

 names

{

for $name in
 $doc//E/@name/data()

return


 name{$name}/name

 }

 /names

 )



 Is it for indexing ?



 Hope it helps,



 Best regards,



 Fabrice Etanchaud

 Questel/Orbit



 *De :* basex-talk-boun...@mailman.uni-konstanz.de [mailto:
 basex-talk-boun...@mailman.uni-konstanz.de] *De la part de* Mansi Sheth
 *Envoyé :* jeudi 6 novembre 2014 16:33
 *À :* Christian Grün
 *Cc :* BaseX
 *Objet :* Re: [basex-talk] Out Of Memory



 This would need a lot of details, so bear with me below:



 Briefly my XML files look like:



 A name=

 B name=

C name=

 D name=

  E name=/



 A can contain B, C or D and B, C or D can contain E. We have 1000s
 (currently 3000 in my test data set) of such xml files, of size 50MB on an
 average. Its tons of data ! Currently, my database is of ~18GB in size.



 Query: /A/*//E/@name/string()



 This query, was going OOM, within few mins.



 I tried a few ways, of whitelisting, with contain clause, to truncate the
 result set. That didn't help too. So, now I am out of ideas. This is giving
 JVM 10GB of dedicated memory.



 Once, above query works and doesn't go Out Of Memory, I also need
 corresponding file names too:



 XYZ.xml //E/@name

 PQR.xml //E/@name



 Let me know if you would need more details, to appreciate the issue ?

 - Mansi



 On Thu, Nov 6, 2014 at 8:48 AM, Christian Grün christian.gr...@gmail.com
 wrote:

 Hi Mansi,

 I think we need more information on the queries that are causing the
 problems.

 Best,
 Christian




 On Wed, Nov 5, 2014 at 8:48 PM, Mansi Sheth mansi.sh...@gmail.com wrote:
  Hello,
 
  I have a use case, where I have to extract lots in information from each
 XML
  in each DB. Something like, attribute values of most of the nodes in an
 XML.
  For such, queries based goes Out Of Memory with below exception. I am
 giving
  it ~12GB of RAM on i7 processor. Well I can't complain here since I am
 most
  definitely asking for loads of data, but is there any way I can get these
  kinds of data successfully ?
 
  mansi-veracode:BigData mansiadmin$ ~/Downloads/basex/bin/basexhttp
  BaseX 8.0 beta b45c1e2 [Server]
  Server was started (port: 1984)
  HTTP Server was started (port: 8984)
  Exception in thread qtp2068921630-18 java.lang.OutOfMemoryError: Java
 heap
  space
  at
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
  at
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
  at
 
 org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
  at java.lang.Thread.run(Thread.java:744)
 
 
  --
  - Mansi





 --

 - Mansi




-- 
- Mansi

Re: [basex-talk] Out Of Memory

2014-11-06 Thread Mansi Sheth

I would be doing tons of post processing. I never use UI. I either use REST
thru cURL or command line.

I would basically need data in below format:

XML File Name, @name

I am trying to whitelist picking up values for only
starts-with(@name,pqr). where pqr is a list of 150 odd values.

My file names, are essentially some ID/keys, which I would need to map it
further using sqlite to some values and may be group by it.. etc.

So, basically I am trying to visualize some data, based on its existence in
which xml files. So, yes count(query) would be fine, but won't solve much
purpose, since I still need value pqr.