Re: [basex-talk] Coding help

2019-08-05 Thread Rick Graham
Hi Greg,

So, to be clear and succinct, the goal is to create a single XML file
containing all XML files that have a predefined text string match in them,
yes?

If so, I'm wondering if creating any database is necessary. A single pass
through all the files, searching for the text string, and appending matched
files as you go seems sufficient.

R

On Mon, Aug 5, 2019, 08:42 Greg Kawchuk  wrote:

> Hi everyone,
> I'm wondering if someone could provide what I think is a brief script for
> a scientific project to do the following.
> The goal is to identify XML documents from a very large collection that
> would be too big to load into a database all at once.
>
> Here is how I see the functions provided by the code.
> 1. In the script, the user could enter the path of the target folder (with
> millions of XML documents).
> 2. In the script, the user would enter the number of documents to load
> into a database at a given time (i =. 1,000) depending on memory
> limitations.
> 3. The code would then create a temporary database from the first (i) xml
> files in the target folder.
> 4. The code would then search the 1000 xml documents in the database for a
> pre-defined text string.
> 5. If hits exist for the text string, the code would write those documents
> to a unique XML file.
> 6. Clear the database.
> 7. Read in the next 1000 files (or remaining files in the folder).
> 8. Return to #4.
>
> There would be no need to append XML files in step 5. The resulting XML
> files could be concatenated afterwards.
> Thank you in advance. If you have any questions, please feel free to email
> me here.
> Greg
>
> ***
> Greg Kawchuk BSC, DC, MSc, PhD.
> Professor, Faculty of Rehabilitation Medicine
> University of Alberta
> greg.kawc...@ualberta.ca
> 780-492-6891
>


Re: [basex-talk] BaseX/GUI v9.1.2 memory use

2019-01-22 Thread Rick Graham
Thanks Bridger!

Indeed, I quit basexgui and manually edited .basexgui to set the project
directory to a newly created empty directory.  basexgui seems normal/stable
after that.

I rarely, as in almost never, use wine but I didn't have this issue with
previous versions of BaseX.  Something seems unexpected here.


On Tue, Jan 22, 2019 at 11:04 PM Bridger Dyson-Smith 
wrote:

> Hi Rick, et al,
> I think (but am not 100% sure) that the GUI defaults to looking through
> your home directory on startup. So, somewhere in
> `~/rick/.wine/dosdevices/...` you have symbolic links that are looped.
>
> I think you might be able to circumvent this problem by finding
> `.basexgui` - it would probably be close to wherever you started the GUI
> from on your filesystem. I think you can edit some of the PATHS there and
> that may help?
>
> Again, I'm not sure. HTH!
> Best,
> Bridger
>
> On Tue, Jan 22, 2019 at 4:56 PM Rick Graham  wrote:
>
>> The command-line seemed to be operating normally.
>>
>> What exactly is/are my project directories?
>>
>> I attached to the running GUI instance `strace -f -e trace=stat -p 13368`
>> and it has infinite repetitions of:
>>
>> [pid 13436]
>>> stat("/home/rick/.wine/dosdevices/z:/sys/class/thermal/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone0/device/subsystem/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:75/subsystem/devices/PNP0C0A:02",
>>> 0x7f7beb2796e0) = -1 ELOOP (Too many levels of symbolic links)
>>
>>
>> What's going on here?
>>
>> On Tue, Jan 22, 2019 at 10:21 PM Christian Grün <
>> christian.gr...@gmail.com> wrote:
>>
>>> at org.basex.gui.view.project.ProjectFiles.add(ProjectFiles.java:167)
>>>>> at org.basex.gui.view.project.ProjectFiles.add(ProjectFiles.java:173)
>>>>> at org.basex.gui.view.project.ProjectFiles.add(ProjectFiles.java:173)
>>>>> at org.basex.gui.view.project.ProjectFiles.add(ProjectFiles.java:173)
>>>>>
>>>>
>>> Looks like a endless loop that is caused by parsing the files in your
>>> project directory. Do you possibly have any symbolic links?
>>>
>>> Can you reproduce the problem with a completely fresh BaseX zip archive?
>>>
>>>
>>>


Re: [basex-talk] BaseX/GUI v9.1.2 memory use

2019-01-22 Thread Rick Graham
The command-line seemed to be operating normally.

What exactly is/are my project directories?

I attached to the running GUI instance `strace -f -e trace=stat -p 13368`
and it has infinite repetitions of:

[pid 13436]
> stat("/home/rick/.wine/dosdevices/z:/sys/class/thermal/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone2/subsystem/thermal_zone0/device/subsystem/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:75/subsystem/devices/PNP0C0A:02",
> 0x7f7beb2796e0) = -1 ELOOP (Too many levels of symbolic links)


What's going on here?

On Tue, Jan 22, 2019 at 10:21 PM Christian Grün 
wrote:

> at org.basex.gui.view.project.ProjectFiles.add(ProjectFiles.java:167)
>>> at org.basex.gui.view.project.ProjectFiles.add(ProjectFiles.java:173)
>>> at org.basex.gui.view.project.ProjectFiles.add(ProjectFiles.java:173)
>>> at org.basex.gui.view.project.ProjectFiles.add(ProjectFiles.java:173)
>>>
>>
> Looks like a endless loop that is caused by parsing the files in your
> project directory. Do you possibly have any symbolic links?
>
> Can you reproduce the problem with a completely fresh BaseX zip archive?
>
>
>


Re: [basex-talk] BaseX/GUI v9.1.2 memory use

2019-01-22 Thread Rick Graham
laf.synth.SynthTabbedPaneUI.update(SynthTabbedPaneUI.java:376)
> at javax.swing.JComponent.paintComponent(JComponent.java:780)
> at javax.swing.JComponent.paint(JComponent.java:1056)
> at javax.swing.JComponent.paintChildren(JComponent.java:889)
> at javax.swing.JComponent.paint(JComponent.java:1065)
> at javax.swing.JComponent.paintChildren(JComponent.java:889)
> at javax.swing.JComponent.paint(JComponent.java:1065)
> at javax.swing.JComponent.paintChildren(JComponent.java:889)
> at javax.swing.JComponent.paint(JComponent.java:1065)
> at javax.swing.JComponent.paintChildren(JComponent.java:889)
> at javax.swing.JComponent.paint(JComponent.java:1065)
> at javax.swing.JComponent.paintChildren(JComponent.java:889)
> at javax.swing.JComponent.paint(JComponent.java:1065)
> at javax.swing.JComponent.paintChildren(JComponent.java:889)
> at javax.swing.JComponent.paint(JComponent.java:1065)
> at javax.swing.JComponent.paintChildren(JComponent.java:889)
> at javax.swing.JComponent.paint(JComponent.java:1065)
> at javax.swing.JComponent.paintChildren(JComponent.java:889)
> at javax.swing.JComponent.paint(JComponent.java:1065)
> at org.basex.gui.view.ViewContainer.paint(ViewContainer.java:221)
> at javax.swing.JComponent.paintChildren(JComponent.java:889)
> at javax.swing.JComponent.paint(JComponent.java:1065)
> at javax.swing.JComponent.paintChildren(JComponent.java:889)
> at javax.swing.JComponent.paint(JComponent.java:1065)
> at javax.swing.JComponent.paintChildren(JComponent.java:889)
> at javax.swing.JComponent.paint(JComponent.java:1065)



On Tue, Jan 22, 2019 at 10:03 PM Christian Grün 
wrote:

> Hm, I couldn’t reproduce this out of the box. Does the problem only
> occur in your GUI instance? Did you check out the behavior on
> command-line or in the DBA as well?
>
>
> On Tue, Jan 22, 2019 at 9:51 PM Rick Graham  wrote:
> >
> > Hello,
> >
> > Thanks again, as always, for a great product.
> >
> > I just installed BaseX v9.1.2 (upgrading from a previous v9.1.2
> snapshot), launched the GUI and then got interrupted.  When I returned,
> almost all of the JVM's memory was being used.  I hit "GC" several times
> but it didn't seem to help.  I had no database loaded/open.  Seems like
> some memory isn't getting freed properly.
> >
> > Here's my "INFO"
> >
> >> General Information:
> >>  Version: 9.1.2
> >>  Used Memory: 1593 MB
> >> Global options:
> >>  AUTHMETHOD: Basic
> >>  CACHETIMEOUT: 3600
> >>  DBPATH: /usr/local/src/basex/data
> >>  DEBUG: false
> >>  FAIRLOCK: false
> >>  HOST: localhost
> >>  HTTPLOCAL: false
> >>  IGNORECERT: false
> >>  IGNOREHOSTNAME: false
> >>  KEEPALIVE: 600
> >>  LANG: English
> >>  LANGKEYS: false
> >>  LOG: true
> >>  LOGMSGMAXLEN: 1000
> >>  LOGPATH: .logs
> >>  NONPROXYHOSTS:
> >>  PARALLEL: 8
> >>  PARSERESTXQ: 3
> >>  PASSWORD:
> >>  PORT: 1984
> >>  PROXYHOST:
> >>  PROXYPORT: 0
> >>  REPOPATH: /usr/local/src/basex/repo
> >>  RESTPATH:
> >>  RESTXQPATH:
> >>  SERVERHOST:
> >>  SERVERPORT: 1984
> >>  STOPPORT: 8985
> >>  TIMEOUT: 30
> >>  USER:
> >>  WEBPATH: /usr/local/src/basex/webapp
> >> Local options
> >>  ADDARCHIVES: true
> >>  ADDCACHE: false
> >>  ADDRAW: false
> >>  ARCHIVENAME: false
> >>  ATTRINCLUDE:
> >>  ATTRINDEX: true
> >>  AUTOFLUSH: true
> >>  AUTOOPTIMIZE: false
> >>  BINDINGS:
> >>  CASESENS: false
> >>  CATFILE:
> >>  CHECKSTRINGS: true
> >>  CHOP: true
> >>  COMPPLAN: true
> >>  COPYNODE: true
> >>  CREATEFILTER: *.xml
> >>  CREATEONLY: false
> >>  CSVPARSER:
> >>  DEFAULTDB: false
> >>  DIACRITICS: false
> >>  DOTCOMPACT: false
> >>  DOTPLAN: false
> >>  DTD: false
> >>  ENFORCEINDEX: false
> >>  EXPORTER:
> >>  FORCECREATE: false
> >>  FTINCLUDE:
> >>  FTINDEX: false
> >>  HTMLPARSER:
> >>  INLINELIMIT: 100
> >>  INTPARSE: false
> >>  JSONPARSER:
> >>  LANGUAGE: en
> >>  LSERROR: 0
> >>  MAINMEM: false
> >>  MAXCATS: 100
> >>  MAXLEN: 96
> >>  MAXSTAT: 30
> >>  MIXUPDATES: false
> >>  PARSER: xml
> >>  QUERYINFO: true
> >>  RUNQUERY: true
> >>  RUNS: 1
> >>  SERIALIZE: true
> >>  SERIALIZER:
> >>  SKIPCORRUPT: false
> >>  SPLITSIZE: 0
> >>  STEMMING: false
> >>  STOPWORDS:
> >>  STRIPNS: false
> >>  TAILCALLS: 256
> >>  TEXTINCLUDE:
> >>  TEXTINDEX: true
> >>  TEXTPARSER:
> >>  TOKENINCLUDE:
> >>  TOKENINDEX: false
> >>  UPDINDEX: false
> >>  WRITEBACK: false
> >>  XINCLUDE: true
> >>  XMLPLAN: true
> >
> >
> > Best regards,
> > Richard
> >
>


[basex-talk] BaseX/GUI v9.1.2 memory use

2019-01-22 Thread Rick Graham
Hello,

Thanks again, as always, for a great product.

I just installed BaseX v9.1.2 (upgrading from a previous v9.1.2 snapshot),
launched the GUI and then got interrupted.  When I returned, almost all of
the JVM's memory was being used.  I hit "GC" several times but it didn't
seem to help.  I had no database loaded/open.  Seems like some memory isn't
getting freed properly.

Here's my "INFO"

General Information:
>  Version: 9.1.2
>  Used Memory: 1593 MB
> Global options:
>  AUTHMETHOD: Basic
>  CACHETIMEOUT: 3600
>  DBPATH: /usr/local/src/basex/data
>  DEBUG: false
>  FAIRLOCK: false
>  HOST: localhost
>  HTTPLOCAL: false
>  IGNORECERT: false
>  IGNOREHOSTNAME: false
>  KEEPALIVE: 600
>  LANG: English
>  LANGKEYS: false
>  LOG: true
>  LOGMSGMAXLEN: 1000
>  LOGPATH: .logs
>  NONPROXYHOSTS:
>  PARALLEL: 8
>  PARSERESTXQ: 3
>  PASSWORD:
>  PORT: 1984
>  PROXYHOST:
>  PROXYPORT: 0
>  REPOPATH: /usr/local/src/basex/repo
>  RESTPATH:
>  RESTXQPATH:
>  SERVERHOST:
>  SERVERPORT: 1984
>  STOPPORT: 8985
>  TIMEOUT: 30
>  USER:
>  WEBPATH: /usr/local/src/basex/webapp
> Local options
>  ADDARCHIVES: true
>  ADDCACHE: false
>  ADDRAW: false
>  ARCHIVENAME: false
>  ATTRINCLUDE:
>  ATTRINDEX: true
>  AUTOFLUSH: true
>  AUTOOPTIMIZE: false
>  BINDINGS:
>  CASESENS: false
>  CATFILE:
>  CHECKSTRINGS: true
>  CHOP: true
>  COMPPLAN: true
>  COPYNODE: true
>  CREATEFILTER: *.xml
>  CREATEONLY: false
>  CSVPARSER:
>  DEFAULTDB: false
>  DIACRITICS: false
>  DOTCOMPACT: false
>  DOTPLAN: false
>  DTD: false
>  ENFORCEINDEX: false
>  EXPORTER:
>  FORCECREATE: false
>  FTINCLUDE:
>  FTINDEX: false
>  HTMLPARSER:
>  INLINELIMIT: 100
>  INTPARSE: false
>  JSONPARSER:
>  LANGUAGE: en
>  LSERROR: 0
>  MAINMEM: false
>  MAXCATS: 100
>  MAXLEN: 96
>  MAXSTAT: 30
>  MIXUPDATES: false
>  PARSER: xml
>  QUERYINFO: true
>  RUNQUERY: true
>  RUNS: 1
>  SERIALIZE: true
>  SERIALIZER:
>  SKIPCORRUPT: false
>  SPLITSIZE: 0
>  STEMMING: false
>  STOPWORDS:
>  STRIPNS: false
>  TAILCALLS: 256
>  TEXTINCLUDE:
>  TEXTINDEX: true
>  TEXTPARSER:
>  TOKENINCLUDE:
>  TOKENINDEX: false
>  UPDINDEX: false
>  WRITEBACK: false
>  XINCLUDE: true
>  XMLPLAN: true


Best regards,
Richard


Re: [basex-talk] BaseX/GUI using all memory

2018-12-17 Thread Rick Graham
Hi Christian,

I tried the snapshot release and it seems to be much better about releasing
memory.

I still needed to increase "-Xmx" so that nvdcve-1.0-2018.json.zip could be
loaded (all default settings, I think), but after I did, everything seemed
to work fine.

I'll continue to exercise it.

Thanks!

Best regards,
RG

On Mon, Dec 17, 2018 at 6:39 PM Christian Grün 
wrote:

> Hi Rick,
>
> While investing some more time in profiling, we encountered one memory
> leak by a) creating a database and b) adding additional documents via
> the Database Manage dialog in a second step. In Java, a strange
> decision was taken that top-level swing containers (such as our
> progress bar) won’t be garbage collected, even after they have been
> disposed [1].
>
> I guess this is not a very serious leak in BaseX (it has never been
> reported in the past), but I have added a quick fix to tackle the most
> obvious case, and I’ll be interested in hearing if this will already
> reduce memory usage in your use experiments. A new snapshot is
> available [2].
>
> Best,
> Christian
>
> [1] https://github.com/BaseXdb/basex/issues/1650
> [2] http://files.basex.org/releases/latest/
>
>
>
> On Mon, Dec 17, 2018 at 12:49 PM Christian Grün
>  wrote:
> >
> > Hi Rick,
> >
> > Thanks for your observations.
> >
> > I restricted my main memory to 2 GB and I played around with your
> > sample data (with Windows). My memory consumption never exceeded 200
> > MB (and after closing everything, it goes back to appr. 30 MB). Maybe
> > there is a single operation that I missed?
> >
> > If the limits for query results have been increased in the GUI
> > Preferences (in the "Result" tab), memory consumption might rise as
> > well. If you have not changed the defaults, you could help us by…
> >
> > • opening the "Used Memory" dialog (I think you have done this
> > already, right?) and
> > • clicking the "GC" after each single action you perform.
> >
> > If the "Used Memory" value rises a lot after a specific action even
> > after garbage collection, and if it doesn’t decrease after closing the
> > database, your editor tabs, visualizations, etc., then you might have
> > been able to isolate the operation that leads to the observed memory
> > leak. If you believe that the visualizations might affect memory
> > consumption, you can close them if a database is opened, and restart
> > BaseX (visualizations won’t be computed if they are not displayed).
> > Feel free to provide us with a list of the actions and the values for
> > the observed memory consumption.
> >
> > Best,
> > Christian
>


[basex-talk] BaseX/GUI using all memory

2018-12-15 Thread Rick Graham
Greetings,

First of all, thank you very much for BaseX.  It has made many of my
assignments this semester doable, more enjoyable, and better.  Using it, I
could demonstrate database features at scale where only toy minimal
examples were required.

The source of most of my data has been NIST's National Vulnerability
Database  and the Mitre curated
Common Weakness Enumeration Lists
.  Using BaseX, I found and
reported to NIST some errors in the CVE database and the errors have since
been fixed.  I'm sure there are many more fixes and enhancements possible
with the CVE database.

I almost exclusively use BaseX/GUI (version 9.0.2, then 9.1, and now 9.1.1)
on Fedora Linux.  I do have some issues using BaseX/GUI and I am hoping
that some improvements can be made.

Eventually, BaseX/GUI uses all of it's allocated memory.  Even after
increasing the max memory to 3.5GB, eventually it is all used and BaseX/GUI
essentially freezes.  Any operation that freezes, executes quickly and
completely after quitting/killing BaseX/GUI and then restarting.  It seems
that some memory just never gets freed as I develop different XQuery
routines in the Editor, run them, save them to files, click in various
places on the Map Visualization, run XQuery on the Input Bar, etc.
Sometimes closing the database frees the memory, mostly it doesn't.  Once I
think memory was freed when I saved a file in the Editor.  The Java error
messages all seem to relate to running out of memory.  Hitting the "GC"
button never seems to help.  I don't have a specific sequence of actions
that eventually consumes all the memory.

An example database that will demonstrate this memory consumption is
composed of NVD-CVE-1.0-2018
 and CWE
Comprehensive View .
The database takes about 109 MB and has about 5.25 million nodes. Just
viewing the Visualization Map, clicking around, and running/editing queries
like this will eventually use all the memory.

let $cwe := distinct-values
> (
>   for $c in //cve//problemtype__data//value
>   return
>   tokenize($c, "-")[last()]
> )
> for $c in $cwe
> where
> empty(//Weakness_Catalog[contains(@Name, "CWE-2000")]//Weakness[@ID = $c])
> and
> empty(//Weakness_Catalog[contains(@Name,
> "CWE-2000")]//Related_Weakness[@CWE_ID = $c])
> order by number($c)
> return
> $c


`java -version` reports:

openjdk version "1.8.0_191"
> OpenJDK Runtime Environment (build 1.8.0_191-b12)
> OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)


 I will gladly provide any additional info that may help to diagnose these
symptoms, etc.

Thanks and best regards.
RG


Re: [basex-talk] BUG: Can't parse JSON from GZIP archive

2018-11-26 Thread Rick Graham
Hi Christian,

Thanks for your quick replies.

"ISIZE (Input SIZE)" from https://tools.ietf.org/html/rfc1952 looks
promising for most GZIP archives containing a single file.

N.B.: The Database Resource Properties INPUTSIZE for ZIP archives also
shows "0 b".

Thanks and regards,
RG

On Mon, Nov 26, 2018 at 3:29 PM Christian Grün 
wrote:

> > ... would you want to set the Database Resource Properties INPUTSIZE to
> something other than "0 b" when the INPUTPATH is an archive?
>
> In contrast to ZIP archives, there seems to be no trivial way in Java
> to retrieve the uncompressed file size from gzipped input streams. We
> could do some extra efforts (as e.g. proposed in [1]). As the
> processed input stream in BaseX may not rely on a local file, I am not
> sure if there is a generic solution for that.
>
> [1]
> https://stackoverflow.com/questions/7317243/gets-the-uncompressed-size-of-this-gzipinputstream
>
>
>
> > On Mon, Nov 26, 2018 at 11:49 AM Christian Grün <
> christian.gr...@gmail.com> wrote:
> >>
> >> A new stable snapshot is available [1]. In the updated version, all
> >> corner cases should be taken into consideration (such as gzip archive
> >> with missing file suffix in the file name).
> >>
> >> Hope this helps,
> >> Christian
> >>
> >> [1] http://files.basex.org/releases/latest/
> >>
> >>
> >>
> >> On Mon, Nov 26, 2018 at 10:52 AM Christian Grün
> >>  wrote:
> >> >
> >> > Hi Rick,
> >> >
> >> > > I just wanted to use
> https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.gz with
> basexgui.  basexgui doesn't seem to process the archive correctly.
> >> >
> >> > I got it. So you were choosing JSON as input format, and the archive
> >> > input was not chosen for import.
> >> >
> >> > The challenge seems to be that the filename is not stored inside this
> >> > particular .gz archive, so the ".json" substring in the original file
> >> > is the only hint that the compressed file is a json file. This is
> >> > different for ZIP archives, in which filenames must be stored inside
> >> > the archive (in .gz archives this is optional).
> >> >
> >> > By default, we thus assume that the input of .gz archives is XML. I’ll
> >> > see if/how we can find a solution for this, and if we the input format
> >> > choice can be utilized to correctly interpret the file contents.
> >> >
> >> > > P.S.:  Regarding GitHub issues...  I know how to search those.  How
> do I search past mailman threads?
> >> >
> >> > You can search via the basex-talk mail archive (see the link on our
> >> > web site [1]). Classical search engines will give you valuable results
> >> > from StackOverflow and other sites.
> >> >
> >> > Best,
> >> > Christian
> >> >
> >> > [1] http://basex.org/about/open-source/
> >> >
> >> >
> >> >
> >> > > On Sun, Nov 25, 2018 at 8:53 PM Christian Grün <
> christian.gr...@gmail.com> wrote:
> >> > >>
> >> > >> Hi Rick,
> >> > >>
> >> > >>> Would've filed an issue, but the request is to post here first.
> (?)
> >> > >>
> >> > >>
> >> > >> Thanks. Many GitHub issues in the past were no bugs, but
> misunderstandings, so we are asking users to write to the list first.
> >> > >>
> >> > >>> Using version 9.1 BaseX app, a GZIP archive of a JSON database
> can't be used to properly create a database.  Interestingly, a ZIP archive
> works fine.
> >> > >>
> >> > >>
> >> > >> Do you really want to create a BaseX database from a "JSON
> database"? If yes, which format has this database?
> >> > >>
> >> > >> Or does your archive contain a set of (tarred) JSON files, which
> you would like to import in BaseX as XML? Did you try to rename your file
> suffix to .tgz?
> >> > >>
> >> > >> Best,
> >> > >> Christian
> >> > >>
> >> > >>
> >> > >>
>


Re: [basex-talk] BUG: Can't parse JSON from GZIP archive

2018-11-26 Thread Rick Graham
Hi Christian,

Yes, that feature works fine in the latest snapshot.  Thank you.  I'm
wondering if an email to n...@nist.gov might encourage them to include
filenames in all their archives.

And while you're poking around the BaseX archive stuff ... would you want
to set the Database Resource Properties INPUTSIZE to something other than
"0 b" when the INPUTPATH is an archive?

Thanks again,
RG

On Mon, Nov 26, 2018 at 11:49 AM Christian Grün 
wrote:

> A new stable snapshot is available [1]. In the updated version, all
> corner cases should be taken into consideration (such as gzip archive
> with missing file suffix in the file name).
>
> Hope this helps,
> Christian
>
> [1] http://files.basex.org/releases/latest/
>
>
>
> On Mon, Nov 26, 2018 at 10:52 AM Christian Grün
>  wrote:
> >
> > Hi Rick,
> >
> > > I just wanted to use
> https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.gz with
> basexgui.  basexgui doesn't seem to process the archive correctly.
> >
> > I got it. So you were choosing JSON as input format, and the archive
> > input was not chosen for import.
> >
> > The challenge seems to be that the filename is not stored inside this
> > particular .gz archive, so the ".json" substring in the original file
> > is the only hint that the compressed file is a json file. This is
> > different for ZIP archives, in which filenames must be stored inside
> > the archive (in .gz archives this is optional).
> >
> > By default, we thus assume that the input of .gz archives is XML. I’ll
> > see if/how we can find a solution for this, and if we the input format
> > choice can be utilized to correctly interpret the file contents.
> >
> > > P.S.:  Regarding GitHub issues...  I know how to search those.  How do
> I search past mailman threads?
> >
> > You can search via the basex-talk mail archive (see the link on our
> > web site [1]). Classical search engines will give you valuable results
> > from StackOverflow and other sites.
> >
> > Best,
> > Christian
> >
> > [1] http://basex.org/about/open-source/
> >
> >
> >
> > > On Sun, Nov 25, 2018 at 8:53 PM Christian Grün <
> christian.gr...@gmail.com> wrote:
> > >>
> > >> Hi Rick,
> > >>
> > >>> Would've filed an issue, but the request is to post here first.  (?)
> > >>
> > >>
> > >> Thanks. Many GitHub issues in the past were no bugs, but
> misunderstandings, so we are asking users to write to the list first.
> > >>
> > >>> Using version 9.1 BaseX app, a GZIP archive of a JSON database can't
> be used to properly create a database.  Interestingly, a ZIP archive works
> fine.
> > >>
> > >>
> > >> Do you really want to create a BaseX database from a "JSON database"?
> If yes, which format has this database?
> > >>
> > >> Or does your archive contain a set of (tarred) JSON files, which you
> would like to import in BaseX as XML? Did you try to rename your file
> suffix to .tgz?
> > >>
> > >> Best,
> > >> Christian
> > >>
> > >>
> > >>
>


Re: [basex-talk] BUG: Can't parse JSON from GZIP archive

2018-11-25 Thread Rick Graham
Hi Christian,

Thanks for the reply.

I just wanted to use
https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.gz with
basexgui.  basexgui doesn't seem to process the archive correctly.

The archive
https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.zip seems to
be processed fine by basexgui.

This seems to be a basex/basexgui bug or at least a limitation, yes?

Regards,
RG

P.S.:  Regarding GitHub issues...  I know how to search those.  How do I
search past mailman threads?

On Sun, Nov 25, 2018 at 8:53 PM Christian Grün 
wrote:

> Hi Rick,
>
> Would've filed an issue, but the request is to post here first.  (?)
>>
>
> Thanks. Many GitHub issues in the past were no bugs, but
> misunderstandings, so we are asking users to write to the list first.
>
> Using version 9.1 BaseX app, a GZIP archive
>>  of a
>> JSON database can't be used to properly create a database.  Interestingly,
>> a ZIP archive
>> 
>> works fine.
>>
>
> Do you really want to create a BaseX database from a "JSON database"? If
> yes, which format has this database?
>
> Or does your archive contain a set of (tarred) JSON files, which you would
> like to import in BaseX as XML? Did you try to rename your file suffix to
> .tgz?
>
> Best,
> Christian
>
>
>
>


Re: [basex-talk] BUG: Can't parse JSON from GZIP archive

2018-11-25 Thread Rick Graham
Would've filed an issue, but the request is to post here first.  (?)

Using version 9.1 BaseX app, a GZIP archive
 of a
JSON database can't be used to properly create a database.  Interestingly,
a ZIP archive
 works
fine.

There is no error message, just an empty database is created silently.

IDK if the GZIP problem is more widespread.